Park, Byeonghyeok; Baek, Min-Jeong; Min, Byoungnam; Choi, In-Geol
2017-09-01
Genome annotation is a primary step in genomic research. To establish a light and portable prokaryotic genome annotation pipeline for use in individual laboratories, we developed a Shiny app package designated as "P-CAPS" (Prokaryotic Contig Annotation Pipeline Server). The package is composed of R and Python scripts that integrate publicly available annotation programs into a server application. P-CAPS is not only a browser-based interactive application but also a distributable Shiny app package that can be installed on any personal computer. The final annotation is provided in various standard formats and is summarized in an R markdown document. Annotation can be visualized and examined with a public genome browser. A benchmark test showed that the annotation quality and completeness of P-CAPS were reliable and compatible with those of currently available public pipelines.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Gao, Jianing; Wan, Changlin; Zhang, Huan; Li, Ao; Zang, Qiguang; Ban, Rongjun; Ali, Asim; Yu, Zhenghua; Shi, Qinghua; Jiang, Xiaohua; Zhang, Yuanwei
2017-10-03
Copy number variations (CNVs) are the main genetic structural variations in cancer genome. Detecting CNVs in genetic exome region is efficient and cost-effective in identifying cancer associated genes. Many tools had been developed accordingly and yet these tools lack of reliability because of high false negative rate, which is intrinsically caused by genome exonic bias. To provide an alternative option, here, we report Anaconda, a comprehensive pipeline that allows flexible integration of multiple CNV-calling methods and systematic annotation of CNVs in analyzing WES data. Just by one command, Anaconda can generate CNV detection result by up to four CNV detecting tools. Associated with comprehensive annotation analysis of genes involved in shared CNV regions, Anaconda is able to deliver a more reliable and useful report in assistance with CNV-associate cancer researches. Anaconda package and manual can be freely accessed at http://mcg.ustc.edu.cn/bsc/ANACONDA/ .
Cao, Mingshu; Fraser, Karl; Rasmussen, Susanne
2013-10-31
Mass spectrometry coupled with chromatography has become the major technical platform in metabolomics. Aided by peak detection algorithms, the detected signals are characterized by mass-over-charge ratio (m/z) and retention time. Chemical identities often remain elusive for the majority of the signals. Multi-stage mass spectrometry based on electrospray ionization (ESI) allows collision-induced dissociation (CID) fragmentation of selected precursor ions. These fragment ions can assist in structural inference for metabolites of low molecular weight. Computational investigations of fragmentation spectra have increasingly received attention in metabolomics and various public databases house such data. We have developed an R package "iontree" that can capture, store and analyze MS2 and MS3 mass spectral data from high throughput metabolomics experiments. The package includes functions for ion tree construction, an algorithm (distMS2) for MS2 spectral comparison, and tools for building platform-independent ion tree (MS2/MS3) libraries. We have demonstrated the utilization of the package for the systematic analysis and annotation of fragmentation spectra collected in various metabolomics platforms, including direct infusion mass spectrometry, and liquid chromatography coupled with either low resolution or high resolution mass spectrometry. Assisted by the developed computational tools, we have demonstrated that spectral trees can provide informative evidence complementary to retention time and accurate mass to aid with annotating unknown peaks. These experimental spectral trees once subjected to a quality control process, can be used for querying public MS2 databases or de novo interpretation. The putatively annotated spectral trees can be readily incorporated into reference libraries for routine identification of metabolites.
GenomeGraphs: integrated genomic data visualization with R.
Durinck, Steffen; Bullard, James; Spellman, Paul T; Dudoit, Sandrine
2009-01-06
Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses. We developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system. GenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R.
Lareau, Caleb A; Aryee, Martin J; Berger, Bonnie
2018-02-15
The 3D architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in DNA looping structure are associated with variation in gene expression and cell state. To systematically assess changes in DNA looping architecture between samples, we introduce diffloop, an R/Bioconductor package that provides a suite of functions for the quality control, statistical testing, annotation, and visualization of DNA loops. We demonstrate this functionality by detecting differences between ENCODE ChIA-PET samples and relate looping to variability in epigenetic state. Diffloop is implemented as an R/Bioconductor package available at https://bioconductor.org/packages/release/bioc/html/diffloop.html. aryee.martin@mgh.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Jaeger, Carsten; Méret, Michaël; Schmitt, Clemens A; Lisec, Jan
2017-08-15
A bottleneck in metabolic profiling of complex biological extracts is confident, non-supervised annotation of ideally all contained, chemically highly diverse small molecules. Recent computational strategies combining sum formula prediction with in silico fragmentation achieve confident de novo annotation, once the correct neutral mass of a compound is known. Current software solutions for automated adduct ion assignment, however, are either publicly unavailable or have been validated against only few experimental electrospray ionization (ESI) mass spectra. We here present findMAIN (find Main Adduct IoN), a new heuristic approach for interpreting ESI mass spectra. findMAIN scores MS 1 spectra based on explained intensity, mass accuracy and isotope charge agreement of adducts and related ionization products and annotates peaks of the (de)protonated molecule and adduct ions. The approach was validated against 1141 ESI positive mode spectra of chemically diverse standard compounds acquired on different high-resolution mass spectrometric instruments (Orbitrap and time-of-flight). Robustness against impure spectra was evaluated. Correct adduct ion assignment was achieved for up to 83% of the spectra. Performance was independent of compound class and mass spectrometric platform. The algorithm proved highly tolerant against spectral contamination as demonstrated exemplarily for co-eluting compounds as well as systematically by pairwise mixing of spectra. When used in conjunction with MS-FINDER, a state-of-the-art sum formula tool, correct sum formulas were obtained for 77% of spectra. It outperformed both 'brute force' approaches and current state-of-the-art annotation packages tested as potential alternatives. Limitations of the heuristic pertained to poorly ionizing compounds and cationic compounds forming [M] + ions. A new, validated approach for interpreting ESI mass spectra is presented, filling a gap in the nontargeted metabolomics workflow. It is freely available in the latest version of R package InterpretMSSpectrum. Copyright © 2017 John Wiley & Sons, Ltd.
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
2010-01-01
Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804
Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin
2011-01-01
The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.
MiMiR – an integrated platform for microarray data sharing, mining and analysis
Tomlinson, Chris; Thimma, Manjula; Alexandrakis, Stelios; Castillo, Tito; Dennis, Jayne L; Brooks, Anthony; Bradley, Thomas; Turnbull, Carly; Blaveri, Ekaterini; Barton, Geraint; Chiba, Norie; Maratou, Klio; Soutter, Pat; Aitman, Tim; Game, Laurence
2008-01-01
Background Despite considerable efforts within the microarray community for standardising data format, content and description, microarray technologies present major challenges in managing, sharing, analysing and re-using the large amount of data generated locally or internationally. Additionally, it is recognised that inconsistent and low quality experimental annotation in public data repositories significantly compromises the re-use of microarray data for meta-analysis. MiMiR, the Microarray data Mining Resource was designed to tackle some of these limitations and challenges. Here we present new software components and enhancements to the original infrastructure that increase accessibility, utility and opportunities for large scale mining of experimental and clinical data. Results A user friendly Online Annotation Tool allows researchers to submit detailed experimental information via the web at the time of data generation rather than at the time of publication. This ensures the easy access and high accuracy of meta-data collected. Experiments are programmatically built in the MiMiR database from the submitted information and details are systematically curated and further annotated by a team of trained annotators using a new Curation and Annotation Tool. Clinical information can be annotated and coded with a clinical Data Mapping Tool within an appropriate ethical framework. Users can visualise experimental annotation, assess data quality, download and share data via a web-based experiment browser called MiMiR Online. All requests to access data in MiMiR are routed through a sophisticated middleware security layer thereby allowing secure data access and sharing amongst MiMiR registered users prior to publication. Data in MiMiR can be mined and analysed using the integrated EMAAS open source analysis web portal or via export of data and meta-data into Rosetta Resolver data analysis package. Conclusion The new MiMiR suite of software enables systematic and effective capture of extensive experimental and clinical information with the highest MIAME score, and secure data sharing prior to publication. MiMiR currently contains more than 150 experiments corresponding to over 3000 hybridisations and supports the Microarray Centre's large microarray user community and two international consortia. The MiMiR flexible and scalable hardware and software architecture enables secure warehousing of thousands of datasets, including clinical studies, from microarray and potentially other -omics technologies. PMID:18801157
MiMiR--an integrated platform for microarray data sharing, mining and analysis.
Tomlinson, Chris; Thimma, Manjula; Alexandrakis, Stelios; Castillo, Tito; Dennis, Jayne L; Brooks, Anthony; Bradley, Thomas; Turnbull, Carly; Blaveri, Ekaterini; Barton, Geraint; Chiba, Norie; Maratou, Klio; Soutter, Pat; Aitman, Tim; Game, Laurence
2008-09-18
Despite considerable efforts within the microarray community for standardising data format, content and description, microarray technologies present major challenges in managing, sharing, analysing and re-using the large amount of data generated locally or internationally. Additionally, it is recognised that inconsistent and low quality experimental annotation in public data repositories significantly compromises the re-use of microarray data for meta-analysis. MiMiR, the Microarray data Mining Resource was designed to tackle some of these limitations and challenges. Here we present new software components and enhancements to the original infrastructure that increase accessibility, utility and opportunities for large scale mining of experimental and clinical data. A user friendly Online Annotation Tool allows researchers to submit detailed experimental information via the web at the time of data generation rather than at the time of publication. This ensures the easy access and high accuracy of meta-data collected. Experiments are programmatically built in the MiMiR database from the submitted information and details are systematically curated and further annotated by a team of trained annotators using a new Curation and Annotation Tool. Clinical information can be annotated and coded with a clinical Data Mapping Tool within an appropriate ethical framework. Users can visualise experimental annotation, assess data quality, download and share data via a web-based experiment browser called MiMiR Online. All requests to access data in MiMiR are routed through a sophisticated middleware security layer thereby allowing secure data access and sharing amongst MiMiR registered users prior to publication. Data in MiMiR can be mined and analysed using the integrated EMAAS open source analysis web portal or via export of data and meta-data into Rosetta Resolver data analysis package. The new MiMiR suite of software enables systematic and effective capture of extensive experimental and clinical information with the highest MIAME score, and secure data sharing prior to publication. MiMiR currently contains more than 150 experiments corresponding to over 3000 hybridisations and supports the Microarray Centre's large microarray user community and two international consortia. The MiMiR flexible and scalable hardware and software architecture enables secure warehousing of thousands of datasets, including clinical studies, from microarray and potentially other -omics technologies.
Uppal, Karan; Soltow, Quinlyn A; Strobel, Frederick H; Pittard, W Stephen; Gernert, Kim M; Yu, Tianwei; Jones, Dean P
2013-01-16
Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
EST-PAC a web package for EST annotation and protein sequence prediction
Strahm, Yvan; Powell, David; Lefèvre, Christophe
2006-01-01
With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782
AnnoLnc: a web server for systematically annotating novel human lncRNAs.
Hou, Mei; Tang, Xing; Tian, Feng; Shi, Fangyuan; Liu, Fenglin; Gao, Ge
2016-11-16
Long noncoding RNAs (lncRNAs) have been shown to play essential roles in almost every important biological process through multiple mechanisms. Although the repertoire of human lncRNAs has rapidly expanded, their biological function and regulation remain largely elusive, calling for a systematic and integrative annotation tool. Here we present AnnoLnc ( http://annolnc.cbi.pku.edu.cn ), a one-stop portal for systematically annotating novel human lncRNAs. Based on more than 700 data sources and various tool chains, AnnoLnc enables a systematic annotation covering genomic location, secondary structure, expression patterns, transcriptional regulation, miRNA interaction, protein interaction, genetic association and evolution. An intuitive web interface is available for interactive analysis through both desktops and mobile devices, and programmers can further integrate AnnoLnc into their pipeline through standard JSON-based Web Service APIs. To the best of our knowledge, AnnoLnc is the only web server to provide on-the-fly and systematic annotation for newly identified human lncRNAs. Compared with similar tools, the annotation generated by AnnoLnc covers a much wider spectrum with intuitive visualization. Case studies demonstrate the power of AnnoLnc in not only rediscovering known functions of human lncRNAs but also inspiring novel hypotheses.
EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome
Thibaud-Nissen, Françoise; Campbell, Matthew; Hamilton, John P; Zhu, Wei; Buell, C Robin
2007-01-01
Background Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort. Results We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website , as well as in the Community Annotation track of the Genome Browser. Conclusion We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at . PMID:17961238
Multi-label literature classification based on the Gene Ontology graph.
Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua
2008-12-08
The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
Project for Global Education: Annotated Bibliography.
ERIC Educational Resources Information Center
Institute for World Order, New York, NY.
Over 260 books, textbooks, articles, pamphlets, periodicals, films, and multi-media packages appropriate for the analysis of global issues at the college level are briefly annotated. Entries include classic books and articles as well as a number of recent (1976-1981) publications. The purpose is to assist students and educators in developing a…
ERIC Educational Resources Information Center
Gaddy, C. Stephen
This annotated bibliography of commercially prepared training materials for management and leadership development programs offers 10 topical sections of references applicable to school principal training. Entries were selected by using the following criteria: (1) programs dealing too specifically with management in sales, manufacturing, finance,…
Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.
2012-01-01
Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples related to bacterial pathogenesis. Students first examine alignments of genomes of Escherichia coli O157:H7 strains isolated from three food-poisoning outbreaks using the multiple-genome alignment tool Mauve. Students investigate conservation of virulence factors using the Mauve viewer and by browsing annotations available at the A Systematic Annotation Package for Community Analysis of Genomes database. In the second module, students use an alignment of five Yersinia pestis genomes to analyze single-nucleotide polymorphisms of three genes to classify strains into biovar groups. Students are then given sequences of bacterial DNA amplified from the teeth of corpses from the first and second pandemics of the bubonic plague and asked to classify these new samples. Learning-assessment results reveal student improvement in self-efficacy and content knowledge, as well as students' ability to use BLAST to identify genomic islands and conduct analyses of virulence factors from E. coli O157:H7 or Y. pestis. Each of these educational modules offers educators new ready-to-implement resources for integrating comparative genomic topics into their curricula. PMID:22383620
Chado controller: advanced annotation management with a community annotation system.
Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie
2012-04-01
We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.
RCAS: an RNA centric annotation system for transcriptome-wide regions of interest.
Uyar, Bora; Yusuf, Dilmurat; Wurmus, Ricardo; Rajewsky, Nikolaus; Ohler, Uwe; Akalin, Altuna
2017-06-02
In the field of RNA, the technologies for studying the transcriptome have created a tremendous potential for deciphering the puzzles of the RNA biology. Along with the excitement, the unprecedented volume of RNA related omics data is creating great challenges in bioinformatics analyses. Here, we present the RNA Centric Annotation System (RCAS), an R package, which is designed to ease the process of creating gene-centric annotations and analysis for the genomic regions of interest obtained from various RNA-based omics technologies. The design of RCAS is modular, which enables flexible usage and convenient integration with other bioinformatics workflows. RCAS is an R/Bioconductor package but we also created graphical user interfaces including a Galaxy wrapper and a stand-alone web service. The application of RCAS on published datasets shows that RCAS is not only able to reproduce published findings but also helps generate novel knowledge and hypotheses. The meta-gene profiles, gene-centric annotation, motif analysis and gene-set analysis provided by RCAS provide contextual knowledge which is necessary for understanding the functional aspects of different biological events that involve RNAs. In addition, the array of different interfaces and deployment options adds the convenience of use for different levels of users. RCAS is available at http://bioconductor.org/packages/release/bioc/html/RCAS.html and http://rcas.mdc-berlin.de. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas
2017-01-01
An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets. PMID:28467460
MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.
Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas
2017-01-01
An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.
SplicingTypesAnno: annotating and quantifying alternative splicing events for RNA-Seq data.
Sun, Xiaoyong; Zuo, Fenghua; Ru, Yuanbin; Guo, Jiqiang; Yan, Xiaoyan; Sablok, Gaurav
2015-04-01
Alternative splicing plays a key role in the regulation of the central dogma. Four major types of alternative splicing have been classified as intron retention, exon skipping, alternative 5 splice sites or alternative donor sites, and alternative 3 splice sites or alternative acceptor sites. A few algorithms have been developed to detect splice junctions from RNA-Seq reads. However, there are few tools targeting at the major alternative splicing types at the exon/intron level. This type of analysis may reveal subtle, yet important events of alternative splicing, and thus help gain deeper understanding of the mechanism of alternative splicing. This paper describes a user-friendly R package, extracting, annotating and analyzing alternative splicing types for sequence alignment files from RNA-Seq. SplicingTypesAnno can: (1) provide annotation for major alternative splicing at exon/intron level. By comparing the annotation from GTF/GFF file, it identifies the novel alternative splicing sites; (2) offer a convenient two-level analysis: genome-scale annotation for users with high performance computing environment, and gene-scale annotation for users with personal computers; (3) generate a user-friendly web report and additional BED files for IGV visualization. SplicingTypesAnno is a user-friendly R package for extracting, annotating and analyzing alternative splicing types at exon/intron level for sequence alignment files from RNA-Seq. It is publically available at https://sourceforge.net/projects/splicingtypes/files/ or http://genome.sdau.edu.cn/research/software/SplicingTypesAnno.html. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Learn by Yourself: The Self-Learning Tools for Qualitative Analysis Software Packages
ERIC Educational Resources Information Center
Freitas, Fábio; Ribeiro, Jaime; Brandão, Catarina; Reis, Luís Paulo; de Souza, Francislê Neri; Costa, António Pedro
2017-01-01
Computer Assisted Qualitative Data Analysis Software (CAQDAS) are tools that help researchers to develop qualitative research projects. These software packages help the users with tasks such as transcription analysis, coding and text interpretation, writing and annotation, content search and analysis, recursive abstraction, grounded theory…
The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation.
Kocher, Jean-Pierre A; Quest, Daniel J; Duffy, Patrick; Meiners, Michael A; Moore, Raymond M; Rider, David; Hossain, Asif; Hart, Steven N; Dinu, Valentin
2014-07-01
The Biological Reference Repository (BioR) is a toolkit for annotating variants. BioR stores public and user-specific annotation sources in indexed JSON-encoded flat files (catalogs). The BioR toolkit provides the functionality to combine and retrieve annotation from these catalogs via the command-line interface. Several catalogs from commonly used annotation sources and instructions for creating user-specific catalogs are provided. Commands from the toolkit can be combined with other UNIX commands for advanced annotation processing. We also provide instructions for the development of custom annotation pipelines. The package is implemented in Java and makes use of external tools written in Java and Perl. The toolkit can be executed on Mac OS X 10.5 and above or any Linux distribution. The BioR application, quickstart, and user guide documents and many biological examples are available at http://bioinformaticstools.mayo.edu. © The Author 2014. Published by Oxford University Press.
Chado Controller: advanced annotation management with a community annotation system
Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie
2012-01-01
Summary: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. Availability: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form Contact: valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285827
Siew, Joyce Phui Yee; Khan, Asif M; Tan, Paul T J; Koh, Judice L Y; Seah, Seng Hong; Koo, Chuay Yeng; Chai, Siaw Ching; Armugam, Arunmozhiarasi; Brusic, Vladimir; Jeyaseelan, Kandiah
2004-12-12
Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).
Sharing and reusing multimedia multilingual educational resources in medicine.
Zdrahal, Zdenek; Knoth, Petr; Mulholland, Paul; Collins, Trevor
2013-01-01
The paper describes the Eurogene portal for sharing and reusing multilingual multimedia educational resources in human genetics. The content is annotated using concepts of two ontologies and a topic hierarchy. The ontology annotation is used to guide search and for calculating semantically similar content. Educational resources can be aggregated into learning packages. The system is in routine use since 2009.
GANESH: software for customized annotation of genome regions.
Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek
2003-09-01
GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching and genome-analysis packages. The results are stored in compressed form in a relational database, and are updated automatically on a regular schedule so that they are always immediately available in their most up-to-date versions. A Java front-end, executed as a stand alone application or web applet, provides a graphical interface for navigating the database and for viewing the annotations. There are facilities for importing and exporting data in the format of the Distributed Annotation System (DAS), enabling a GANESH database to be used as a component of a DAS configuration. The system has been used to construct databases for about a dozen regions of human chromosomes and for three regions of mouse chromosomes.
ERIC Educational Resources Information Center
Gilbert, Leslie
Designed to disseminate information to the post-school sector of United Kingdom education, this directory provides information on 50 microcomputer software packages developed by the Microelectronics Education Program (MEP) and available through educational publishers. Subject areas represented include accountancy, biology, business education,…
ERIC Educational Resources Information Center
Rhein, Deborah; Alibrandi, Mary; Lyons, Mary; Sammons, Janice; Doyle, Luther
This bibliography, developed by Project RIMES (Reading Instructional Methods of Efficacy with Students) lists 80 software packages for teaching early reading and spelling to students at risk for reading and spelling failure. The software packages are presented alphabetically by title. Entries usually include a grade level indicator, a brief…
MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.
Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed
2017-01-20
Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use. PMID:28386247
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.
ERIC Educational Resources Information Center
National Council of Teachers of English, Urbana, IL.
This guide contains 550 annotations for English anthologies, textbooks, workbooks, multimedia packages, and other materials for grades 7-12. Works of literature, audiovisual materials, and professional publications are included only when integrally related to specific, listed instructional materials. Entries are grouped into the following subject…
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.
Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A
2017-05-15
Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-09-04
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-06-01
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.
Yu, Guangchuang; Wang, Li-Gen; Yan, Guang-Rong; He, Qing-Yu
2015-02-15
Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary data are available at Bioinformatics online. gcyu@connect.hku.hk or tqyhe@jnu.edu.cn. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
HiTC: exploration of high-throughput ‘C’ experiments
Servant, Nicolas; Lajoie, Bryan R.; Nora, Elphège P.; Giorgetti, Luca; Chen, Chong-Jian; Heard, Edith; Dekker, Job; Barillot, Emmanuel
2012-01-01
Summary: The R/Bioconductor package HiTC facilitates the exploration of high-throughput 3C-based data. It allows users to import and export ‘C’ data, to transform, normalize, annotate and visualize interaction maps. The package operates within the Bioconductor framework and thus offers new opportunities for future development in this field. Availability and implementation: The R package HiTC is available from the Bioconductor website. A detailed vignette provides additional documentation and help for using the package. Contact: nicolas.servant@curie.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22923296
CAMERA: An integrated strategy for compound spectra extraction and annotation of LC/MS data sets
Kuhl, Carsten; Tautenhahn, Ralf; Böttcher, Christoph; Larson, Tony R.; Neumann, Steffen
2013-01-01
Liquid chromatography coupled to mass spectrometry is routinely used for metabolomics experiments. In contrast to the fairly routine and automated data acquisition steps, subsequent compound annotation and identification require extensive manual analysis and thus form a major bottle neck in data interpretation. Here we present CAMERA, a Bioconductor package integrating algorithms to extract compound spectra, annotate isotope and adduct peaks, and propose the accurate compound mass even in highly complex data. To evaluate the algorithms, we compared the annotation of CAMERA against a manually defined annotation for a mixture of known compounds spiked into a complex matrix at different concentrations. CAMERA successfully extracted accurate masses for 89.7% and 90.3% of the annotatable compounds in positive and negative ion mode, respectively. Furthermore, we present a novel annotation approach that combines spectral information of data acquired in opposite ion modes to further improve the annotation rate. We demonstrate the utility of CAMERA in two different, easily adoptable plant metabolomics experiments, where the application of CAMERA drastically reduced the amount of manual analysis. PMID:22111785
DynGO: a tool for visualizing and mining of Gene Ontology and its associations
Liu, Hongfang; Hu, Zhang-Zhi; Wu, Cathy H
2005-01-01
Background A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. Results We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. Conclusion We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website . PMID:16091147
DOSim: an R package for similarity between diseases based on Disease Ontology.
Li, Jiang; Gong, Binsheng; Chen, Xi; Liu, Tao; Wu, Chao; Zhang, Fan; Li, Chunquan; Li, Xiang; Rao, Shaoqi; Li, Xia
2011-06-29
The construction of the Disease Ontology (DO) has helped promote the investigation of diseases and disease risk factors. DO enables researchers to analyse disease similarity by adopting semantic similarity measures, and has expanded our understanding of the relationships between different diseases and to classify them. Simultaneously, similarities between genes can also be analysed by their associations with similar diseases. As a result, disease heterogeneity is better understood and insights into the molecular pathogenesis of similar diseases have been gained. However, bioinformatics tools that provide easy and straight forward ways to use DO to study disease and gene similarity simultaneously are required. We have developed an R-based software package (DOSim) to compute the similarity between diseases and to measure the similarity between human genes in terms of diseases. DOSim incorporates a DO-based enrichment analysis function that can be used to explore the disease feature of an independent gene set. A multilayered enrichment analysis (GO and KEGG annotation) annotation function that helps users explore the biological meaning implied in a newly detected gene module is also part of the DOSim package. We used the disease similarity application to demonstrate the relationship between 128 different DO cancer terms. The hierarchical clustering of these 128 different cancers showed modular characteristics. In another case study, we used the gene similarity application on 361 obesity-related genes. The results revealed the complex pathogenesis of obesity. In addition, the gene module detection and gene module multilayered annotation functions in DOSim when applied on these 361 obesity-related genes helped extend our understanding of the complex pathogenesis of obesity risk phenotypes and the heterogeneity of obesity-related diseases. DOSim can be used to detect disease-driven gene modules, and to annotate the modules for functions and pathways. The DOSim package can also be used to visualise DO structure. DOSim can reflect the modular characteristic of disease related genes and promote our understanding of the complex pathogenesis of diseases. DOSim is available on the Comprehensive R Archive Network (CRAN) or http://bioinfo.hrbmu.edu.cn/dosim.
ERIC Educational Resources Information Center
National Council of Teachers of English, Urbana, IL.
This supplement to the "NCTE Guide to Teaching Materials for English, Grades 7-12" contains annotations for English anthologies, textbooks, workbooks, multimedia packages, and other materials for the junior high and high school levels. Works of literature, audiovisual materials, and professional publications are included when related to specific,…
Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome
NASA Astrophysics Data System (ADS)
Ernst, Jason; Kellis, Manolis
A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal chromatin states in human T cells, based on recurrent and spatially coherent combinations of chromatin marks.We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, largescale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.
Edmands, William M B; Petrick, Lauren; Barupal, Dinesh K; Scalbert, Augustin; Wilson, Mark J; Wickliffe, Jeffrey K; Rappaport, Stephen M
2017-04-04
A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS 2 Miner (Comprehensive MS 2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS 2 data files as inputs. The number of MS 2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS 2 Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS 2 data with a Web application GUI compMS 2 Explorer (Comprehensive MS 2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS 2 Miner workflow consists of the following steps: (i) matching unknown MS 1 features to precursor MS 2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS 2 Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS 2 Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS 2 Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS 2 Explorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).
Quality of Computationally Inferred Gene Ontology Annotations
Škunca, Nives; Altenhoff, Adrian; Dessimoz, Christophe
2012-01-01
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. PMID:22693439
heatmaply: an R package for creating interactive cluster heatmaps for online publishing.
Galili, Tal; O'Callaghan, Alan; Sidi, Jonathan; Sievert, Carson
2018-05-01
heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels. Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout. The heatmaply package is available under the GPL-2 Open Source license. It comes with a detailed vignette, and is freely available from: http://cran.r-project.org/package=heatmaply. tal.galili@math.tau.ac.il. Supplementary data are available at Bioinformatics online.
Haegerich, Tamara M.; David-Ferdon, Corinne; Noonan, Rita K.; Manns, Brian J.; Billie, Holly C.
2016-01-01
Injury and violence prevention strategies have greater potential for impact when they are based on scientific evidence. Systematic reviews of the scientific evidence can contribute key information about which policies and programs might have the greatest impact when implemented. However, systematic reviews have limitations, such as lack of implementation guidance and contextual information, that can limit the application of knowledge. “Technical packages,” developed by knowledge brokers such as the federal government, nonprofit agencies, and academic institutions, have the potential to be an efficient mechanism for making information from systematic reviews actionable. Technical packages provide information about specific evidence-based prevention strategies, along with the estimated costs and impacts, and include accompanying implementation and evaluation guidance to facilitate adoption, implementation, and performance measurement. We describe how systematic reviews can inform the development of technical packages for practitioners, provide examples of technical packages in injury and violence prevention, and explain how enhancing review methods and reporting could facilitate the use and applicability of scientific evidence. PMID:27604301
Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.
Tchechmedjiev, Andon; Abdaoui, Amine; Emonet, Vincent; Melzi, Soumia; Jonnagaddala, Jitendra; Jonquet, Clement
2018-06-01
Second use of clinical data commonly involves annotating biomedical text with terminologies and ontologies. The National Center for Biomedical Ontology Annotator is a frequently used annotation service, originally designed for biomedical data, but not very suitable for clinical text annotation. In order to add new functionalities to the NCBO Annotator without hosting or modifying the original Web service, we have designed a proxy architecture that enables seamless extensions by pre-processing of the input text and parameters, and post processing of the annotations. We have then implemented enhanced functionalities for annotating and indexing free text such as: scoring, detection of context (negation, experiencer, temporality), new output formats and coarse-grained concept recognition (with UMLS Semantic Groups). In this paper, we present the NCBO Annotator+, a Web service which incorporates these new functionalities as well as a small set of evaluation results for concept recognition and clinical context detection on two standard evaluation tasks (Clef eHealth 2017, SemEval 2014). The Annotator+ has been successfully integrated into the SIFR BioPortal platform-an implementation of NCBO BioPortal for French biomedical terminologies and ontologies-to annotate English text. A Web user interface is available for testing and ontology selection (http://bioportal.lirmm.fr/ncbo_annotatorplus); however the Annotator+ is meant to be used through the Web service application programming interface (http://services.bioportal.lirmm.fr/ncbo_annotatorplus). The code is openly available, and we also provide a Docker packaging to enable easy local deployment to process sensitive (e.g. clinical) data in-house (https://github.com/sifrproject). andon.tchechmedjiev@lirmm.fr. Supplementary data are available at Bioinformatics online.
AIRS Maps from Space Processing Software
NASA Technical Reports Server (NTRS)
Thompson, Charles K.; Licata, Stephen J.
2012-01-01
This software package processes Atmospheric Infrared Sounder (AIRS) Level 2 swath standard product geophysical parameters, and generates global, colorized, annotated maps. It automatically generates daily and multi-day averaged colorized and annotated maps of various AIRS Level 2 swath geophysical parameters. It also generates AIRS input data sets for Eyes on Earth, Puffer-sphere, and Magic Planet. This program is tailored to AIRS Level 2 data products. It re-projects data into 1/4-degree grids that can be combined and averaged for any number of days. The software scales and colorizes global grids utilizing AIRS-specific color tables, and annotates images with title and color bar. This software can be tailored for use with other swath data products for the purposes of visualization.
Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome
Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing
2007-01-01
Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628
IMG ER: a system for microbial genome annotation expert review and curation.
Markowitz, Victor M; Mavromatis, Konstantinos; Ivanova, Natalia N; Chen, I-Min A; Chu, Ken; Kyrpides, Nikos C
2009-09-01
A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.
In May 2012, a HESI-sponsored expert workshop yielded a proposed research strategy for systematically discovering, characterizing, and annotating fish early life-stage (FELS) adverse outcome pathways (AOPs) as well as prioritizing AOP development in light of current restrictions ...
Effective Writing Tasks and Feedback for the Internet Generation
ERIC Educational Resources Information Center
Buyse, Kris
2012-01-01
Teaching foreign language writing often lacks adjustments to the requirements of today's students of the "Internet Generation" (iGen): traditionally teachers set a--not very inspiring--topic, a deadline and then return a discouraging, manually underlined and/or annotated text without systematic labeling. The annotated document is then…
MIRA: An R package for DNA methylation-based inference of regulatory activity.
Lawson, John T; Tomazou, Eleni M; Bock, Christoph; Sheffield, Nathan C
2018-03-01
DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for independent region sets with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for each region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of open chromatin and protein binding regions to be leveraged for novel insight into the regulatory state of DNA methylation datasets. R package available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/MIRA.html. nsheffield@virginia.edu.
Haegerich, Tamara M; David-Ferdon, Corinne; Noonan, Rita K; Manns, Brian J; Billie, Holly C
2016-09-07
Injury and violence prevention strategies have greater potential for impact when they are based on scientific evidence. Systematic reviews of the scientific evidence can contribute key information about which policies and programs might have the greatest impact when implemented. However, systematic reviews have limitations, such as lack of implementation guidance and contextual information, that can limit the application of knowledge. "Technical packages," developed by knowledge brokers such as the federal government, nonprofit agencies, and academic institutions, have the potential to be an efficient mechanism for making information from systematic reviews actionable. Technical packages provide information about specific evidence-based prevention strategies, along with the estimated costs and impacts, and include accompanying implementation and evaluation guidance to facilitate adoption, implementation, and performance measurement. We describe how systematic reviews can inform the development of technical packages for practitioners, provide examples of technical packages in injury and violence prevention, and explain how enhancing review methods and reporting could facilitate the use and applicability of scientific evidence. © The Author(s) 2016.
Automated Gene Ontology annotation for anonymous sequence data.
Hennig, Steffen; Groth, Detlef; Lehrach, Hans
2003-07-01
Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.
The KIT Motion-Language Dataset.
Plappert, Matthias; Mandery, Christian; Asfour, Tamim
2016-12-01
Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, therefore, propose the Karlsruhe Institute of Technology (KIT) Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our data set using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our data set or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting data set, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our data set an excellent choice that enables more transparent and comparable research in this important area.
Analysis, annotation, and profiling of the oat seed transcriptome
USDA-ARS?s Scientific Manuscript database
Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...
ERIC Educational Resources Information Center
Aitken, James
This curriculum guide is intended to assist vocational instructors in preparing students for entry-level employment in the field of horticulture and getting them ready for advanced training in the workplace. The package contains a validated competency/skill and task list, an instructor's guide, and an annotated bibliography. The following…
Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm.
Mahieu, Nathaniel G; Spalding, Jonathan L; Gelman, Susan J; Patti, Gary J
2016-09-20
Analysis of a single analyte by mass spectrometry can result in the detection of more than 100 degenerate peaks. These degenerate peaks complicate spectral interpretation and are challenging to annotate. In mass spectrometry-based metabolomics, this degeneracy leads to inflated false discovery rates, data sets containing an order of magnitude more features than analytes, and an inefficient use of resources during data analysis. Although software has been introduced to annotate spectral degeneracy, current approaches are unable to represent several important classes of peak relationships. These include heterodimers and higher complex adducts, distal fragments, relationships between peaks in different polarities, and complex adducts between features and background peaks. Here we outline sources of peak degeneracy in mass spectra that are not annotated by current approaches and introduce a software package called mz.unity to detect these relationships in accurate mass data. Using mz.unity, we find that data sets contain many more complex relationships than we anticipated. Examples include the adduct of glutamate and nicotinamide adenine dinucleotide (NAD), fragments of NAD detected in the same or opposite polarities, and the adduct of glutamate and a background peak. Further, the complex relationships we identify show that several assumptions commonly made when interpreting mass spectral degeneracy do not hold in general. These contributions provide new tools and insight to aid in the annotation of complex spectral relationships and provide a foundation for improved data set identification. Mz.unity is an R package and is freely available at https://github.com/nathaniel-mahieu/mz.unity as well as our laboratory Web site http://pattilab.wustl.edu/software/ .
Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting
2016-01-01
ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181
Technique and cue selection for graphical presentation of generic hyperdimensional data
NASA Astrophysics Data System (ADS)
Howard, Lee M.; Burton, Robert P.
2013-12-01
Several presentation techniques have been created for visualization of data with more than three variables. Packages have been written, each of which implements a subset of these techniques. However, these packages generally fail to provide all the features needed by the user during the visualization process. Further, packages generally limit support for presentation techniques to a few techniques. A new package called Petrichor accommodates all necessary and useful features together in one system. Any presentation technique may be added easily through an extensible plugin system. Features are supported by a user interface that allows easy interaction with data. Annotations allow users to mark up visualizations and share information with others. By providing a hyperdimensional graphics package that easily accommodates presentation techniques and includes a complete set of features, including those that are rarely or never supported elsewhere, the user is provided with a tool that facilitates improved interaction with multivariate data to extract and disseminate information.
ERIC Educational Resources Information Center
Baldwin, Harold; Whitney, Gregory
This curriculum guide is intended to assist vocational instructors in preparing students for entry-level employment as welders and preparing them for advanced training in the workplace. The package contains an overview of new and emerging welding technologies, a competency/skill and task list, an instructor's guide, and an annotated bibliography.…
Transcriptome analysis of PCOS arrested 2-cell embryos.
Lu, Cuiling; Chi, Hongbin; Wang, Yapeng; Feng, Xue; Wang, Lina; Huang, Shuo; Yan, Liying; Lin, Shengli; Liu, Ping; Qiao, Jie
2018-06-18
In an attempt to explore the early developmental arrest in embryos from polycystic ovarian syndrome (PCOS) patients, we sequenced the transcriptome profiles of PCOS arrested 2-cell embryos, non-PCOS arrested 2-cell embryos and non-arrested 2-cell embryos using single-cell RNA-Seq technique. Differential expression analysis was performed using the DEGSeq R package. Gene Ontology (GO) enrichment was analyzed using the GOseq R package. Data revealed 62 differentially expressed genes between non-PCOS arrested and PCOS arrested embryos and 2217 differentially expressed genes between PCOS arrested and non-arrested 2-cell embryos. A total of 49 differently expressed genes (DEGs) were annotated with GO terms in the up-regulated genes between PCOS arrested and non-PCOS arrested embryos after GO enrichment. A total of 29 DEGs were annotated with GO terms in the down-regulated genes between PCOS arrested and non-arrested 2-cell embryos after GO enrichment. These data can provide a reference for screening specific genes involved in the arrest of PCOS embryos.
GAMES identifies and annotates mutations in next-generation sequencing projects.
Sana, Maria Elena; Iascone, Maria; Marchetti, Daniela; Palatini, Jeff; Galasso, Marco; Volinia, Stefano
2011-01-01
Next-generation sequencing (NGS) methods have the potential for changing the landscape of biomedical science, but at the same time pose several problems in analysis and interpretation. Currently, there are many commercial and public software packages that analyze NGS data. However, the limitations of these applications include output which is insufficiently annotated and of difficult functional comprehension to end users. We developed GAMES (Genomic Analysis of Mutations Extracted by Sequencing), a pipeline aiming to serve as an efficient middleman between data deluge and investigators. GAMES attains multiple levels of filtering and annotation, such as aligning the reads to a reference genome, performing quality control and mutational analysis, integrating results with genome annotations and sorting each mismatch/deletion according to a range of parameters. Variations are matched to known polymorphisms. The prediction of functional mutations is achieved by using different approaches. Overall GAMES enables an effective complexity reduction in large-scale DNA-sequencing projects. GAMES is available free of charge to academic users and may be obtained from http://aqua.unife.it/GAMES.
Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Microarray data mining using Bioconductor packages.
Nie, Haisheng; Neerincx, Pieter B T; van der Poel, Jan; Ferrari, Francesco; Bicciato, Silvio; Leunissen, Jack A M; Groenen, Martien A M
2009-07-16
This paper describes the results of a Gene Ontology (GO) term enrichment analysis of chicken microarray data using the Bioconductor packages. By checking the enriched GO terms in three contrasts, MM8-PM8, MM8-MA8, and MM8-MM24, of the provided microarray data during this workshop, this analysis aimed to investigate the host reactions in chickens occurring shortly after a secondary challenge with either a homologous or heterologous species of Eimeria. The results of GO enrichment analysis using GO terms annotated to chicken genes and GO terms annotated to chicken-human orthologous genes were also compared. Furthermore, a locally adaptive statistical procedure (LAP) was performed to test differentially expressed chromosomal regions, rather than individual genes, in the chicken genome after Eimeria challenge. GO enrichment analysis identified significant (raw p-value < 0.05) GO terms for all three contrasts included in the analysis. Some of the GO terms linked to, generally, primary immune responses or secondary immune responses indicating the GO enrichment analysis is a useful approach to analyze microarray data. The comparisons of GO enrichment results using chicken gene information and chicken-human orthologous gene information showed more refined GO terms related to immune responses when using chicken-human orthologous gene information, this suggests that using chicken-human orthologous gene information has higher power to detect significant GO terms with more refined functionality. Furthermore, three chromosome regions were identified to be significantly up-regulated in contrast MM8-PM8 (q-value < 0.01). Overall, this paper describes a practical approach to analyze microarray data in farm animals where the genome information is still incomplete. For farm animals, such as chicken, with currently limited gene annotation, borrowing gene annotation information from orthologous genes in well-annotated species, such as human, will help improve the pathway analysis results substantially. Furthermore, LAP analysis approach is a relatively new and very useful way to be applied in microarray analysis.
Pérez-Pérez, Martín; Glez-Peña, Daniel; Fdez-Riverola, Florentino; Lourenço, Anália
2015-02-01
Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren
2016-11-01
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.
The Resource Directory: Designing Your Success.
ERIC Educational Resources Information Center
Bowers, Richard A.
1995-01-01
Discusses computer software and system design in the information industry and provides an annotated bibliography of 31 resources that address the issue of design. Highlights include competition, color use, hardware and presentation design, content and packaging, screen design, graphics, and interactive multimedia. A sidebar reviews and rates seven…
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.
Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B
2015-08-18
Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
Leroy, Philippe; Guilhot, Nicolas; Sakai, Hiroaki; Bernard, Aurélien; Choulet, Frédéric; Theil, Sébastien; Reboux, Sébastien; Amano, Naoki; Flutre, Timothée; Pelegrin, Céline; Ohyanagi, Hajime; Seidel, Michael; Giacomoni, Franck; Reichstadt, Mathieu; Alaux, Michael; Gicquello, Emmanuelle; Legeai, Fabrice; Cerutti, Lorenzo; Numa, Hisataka; Tanaka, Tsuyoshi; Mayer, Klaus; Itoh, Takeshi; Quesneville, Hadi; Feuillet, Catherine
2012-01-01
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future. PMID:22645565
A Guide to Instructional Resources for Consumers' Education.
ERIC Educational Resources Information Center
Johnston, William L.; Greenspan, Nancy B.
This annotated bibliography lists 295 selected instructional references, resources, and teaching aids for consumer education. It includes a variety of both print and nonprint materials, such as films, filmstrips, multimedia kits, games and learning packages for classroom and group instruction, textbooks for all age levels, and references for both…
The CTD2 Center at the University of Texas MD Anderson Cancer Center utilized a functional annotation of mutations and fusions found in human cancers using two cell models, Ba/F3 (murine pro-B suspension cells) and MCF10A (human non-tumorigenic mammary epithelial cells). Read the abstract
The CTD2 Center at the University of Texas MD Anderson Cancer Center utilized a functional annotation of mutations and fusions found in human cancers using two cell models, Ba/F3 (murine pro-B suspension cells) and MCF10A (human non-tumorigenic mammary epithelial cells). Read the abstract
GenomeRNAi: a database for cell-based RNAi phenotypes.
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.
GenomeRNAi: a database for cell-based RNAi phenotypes
Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael
2007-01-01
RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194
Current and future trends in marine image annotation software
NASA Astrophysics Data System (ADS)
Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.
2016-12-01
Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images. Integration into available MIAS is currently limited to semi-automated processes of pixel recognition through computer-vision modules that compile expert-based knowledge. Important topics aiding the choice of a specific software are outlined, the ideal software is discussed and future trends are presented.
Ong, Edison; He, Yongqun
2016-01-01
Hundreds of biological and biomedical ontologies have been developed to support data standardization, integration and analysis. Although ontologies are typically developed for community usage, community efforts in ontology development are limited. To support ontology visualization, distribution, and community-based annotation and development, we have developed Ontokiwi, an ontology extension to the MediaWiki software. Ontokiwi displays hierarchical classes and ontological axioms. Ontology classes and axioms can be edited and added using Ontokiwi form or MediaWiki source editor. Ontokiwi also inherits MediaWiki features such as Wikitext editing and version control. Based on the Ontokiwi/MediaWiki software package, we have developed Ontobedia, which targets to support community-based development and annotations of biological and biomedical ontologies. As demonstrations, we have loaded the Ontology of Adverse Events (OAE) and the Cell Line Ontology (CLO) into Ontobedia. Our studies showed that Ontobedia was able to achieve expected Ontokiwi features. PMID:27570653
Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas
2013-09-01
GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.
Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W
2013-08-01
Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.
SBML Level 3 package: Groups, Version 1 Release 1
Hucka, Michael; Smith, Lucian P.
2017-01-01
Summary Biological models often contain components that have relationships with each other, or that modelers want to treat as belonging to groups with common characteristics or shared metadata. The SBML Level 3 Version 1 Core specification does not provide an explicit mechanism for expressing such relationships, but it does provide a mechanism for SBML packages to extend the Core specification and add additional syntactical constructs. The SBML Groups package for SBML Level 3 adds the necessary features to SBML to allow grouping of model components to be expressed. Such groups do not affect the mathematical interpretation of a model, but they do provide a way to add information that can be useful for modelers and software tools. The SBML Groups package enables a modeler to include definitions of groups and nested groups, each of which may be annotated to convey why that group was created, and what it represents. PMID:28187406
ERIC Educational Resources Information Center
Burger, H. Robert
1983-01-01
Part 1 (SE 533 635) presented programs for use in mineralogy, petrology, and geochemistry. This part presents an annotated list of 64 additional programs, focusing on introductory geology, mapping, and statistical packages for geological analyses. A brief description, source, suggested use(s), programing language, and other information are…
Phenex: ontological annotation of phenotypic diversity.
Balhoff, James P; Dahdul, Wasila M; Kothari, Cartik R; Lapp, Hilmar; Lundberg, John G; Mabee, Paula; Midford, Peter E; Westerfield, Monte; Vision, Todd J
2010-05-05
Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.
pyGeno: A Python package for precision medicine and proteogenomics.
Daouda, Tariq; Perreault, Claude; Lemieux, Sébastien
2016-01-01
pyGeno is a Python package mainly intended for precision medicine applications that revolve around genomics and proteomics. It integrates reference sequences and annotations from Ensembl, genomic polymorphisms from the dbSNP database and data from next-gen sequencing into an easy to use, memory-efficient and fast framework, therefore allowing the user to easily explore subject-specific genomes and proteomes. Compared to a standalone program, pyGeno gives the user access to the complete expressivity of Python, a general programming language. Its range of application therefore encompasses both short scripts and large scale genome-wide studies.
pyGeno: A Python package for precision medicine and proteogenomics
Daouda, Tariq; Perreault, Claude; Lemieux, Sébastien
2016-01-01
pyGeno is a Python package mainly intended for precision medicine applications that revolve around genomics and proteomics. It integrates reference sequences and annotations from Ensembl, genomic polymorphisms from the dbSNP database and data from next-gen sequencing into an easy to use, memory-efficient and fast framework, therefore allowing the user to easily explore subject-specific genomes and proteomes. Compared to a standalone program, pyGeno gives the user access to the complete expressivity of Python, a general programming language. Its range of application therefore encompasses both short scripts and large scale genome-wide studies. PMID:27785359
RNAbrowse: RNA-Seq de novo assembly results browser.
Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe
2014-01-01
Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse.
van den Broek, Evert; van Lieshout, Stef; Rausch, Christian; Ylstra, Bauke; van de Wiel, Mark A; Meijer, Gerrit A; Fijneman, Remond J A; Abeln, Sanne
2016-01-01
Development of cancer is driven by somatic alterations, including numerical and structural chromosomal aberrations. Currently, several computational methods are available and are widely applied to detect numerical copy number aberrations (CNAs) of chromosomal segments in tumor genomes. However, there is lack of computational methods that systematically detect structural chromosomal aberrations by virtue of the genomic location of CNA-associated chromosomal breaks and identify genes that appear non-randomly affected by chromosomal breakpoints across (large) series of tumor samples. 'GeneBreak' is developed to systematically identify genes recurrently affected by the genomic location of chromosomal CNA-associated breaks by a genome-wide approach, which can be applied to DNA copy number data obtained by array-Comparative Genomic Hybridization (CGH) or by (low-pass) whole genome sequencing (WGS). First, 'GeneBreak' collects the genomic locations of chromosomal CNA-associated breaks that were previously pinpointed by the segmentation algorithm that was applied to obtain CNA profiles. Next, a tailored annotation approach for breakpoint-to-gene mapping is implemented. Finally, dedicated cohort-based statistics is incorporated with correction for covariates that influence the probability to be a breakpoint gene. In addition, multiple testing correction is integrated to reveal recurrent breakpoint events. This easy-to-use algorithm, 'GeneBreak', is implemented in R ( www.cran.r-project.org ) and is available from Bioconductor ( www.bioconductor.org/packages/release/bioc/html/GeneBreak.html ).
Neerincx, Pieter BT; Casel, Pierrot; Prickett, Dennis; Nie, Haisheng; Watson, Michael; Leunissen, Jack AM; Groenen, Martien AM; Klopp, Christophe
2009-01-01
Background Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation. PMID:19615109
Neerincx, Pieter Bt; Casel, Pierrot; Prickett, Dennis; Nie, Haisheng; Watson, Michael; Leunissen, Jack Am; Groenen, Martien Am; Klopp, Christophe
2009-07-16
Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines.For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotation.
ERIC Educational Resources Information Center
Kraus, Anne Marie
This companion volume to "Folktale Themes Volume 1: Pourquoi Tales," shows educators how to use folktales to provide meaningful, educational experiences for children. This book provides a complete package using folktales in the classroom--activity pages, teaching ideas, story themes, and an annotated bibliography of further reading for a…
Kusber, W.-H.; Tschöpe, O.; Güntsch, A.; Berendsohn, W. G.
2017-01-01
Abstract Biological research collections holding billions of specimens world-wide provide the most important baseline information for systematic biodiversity research. Increasingly, specimen data records become available in virtual herbaria and data portals. The traditional (physical) annotation procedure fails here, so that an important pathway of research documentation and data quality control is broken. In order to create an online annotation system, we analysed, modeled and adapted traditional specimen annotation workflows. The AnnoSys system accesses collection data from either conventional web resources or the Biological Collection Access Service (BioCASe) and accepts XML-based data standards like ABCD or DarwinCore. It comprises a searchable annotation data repository, a user interface, and a subscription based message system. We describe the main components of AnnoSys and its current and planned interoperability with biodiversity data portals and networks. Details are given on the underlying architectural model, which implements the W3C OpenAnnotation model and allows the adaptation of AnnoSys to different problem domains. Advantages and disadvantages of different digital annotation and feedback approaches are discussed. For the biodiversity domain, AnnoSys proposes best practice procedures for digital annotations of complex records. Database URL: https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys PMID:28365735
GAP Final Technical Report 12-14-04
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrew J. Bordner, PhD, Senior Research Scientist
2004-12-14
The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less
The what, where, how and why of gene ontology—a primer for bioinformaticians
du Plessis, Louis; Škunca, Nives
2011-01-01
With high-throughput technologies providing vast amounts of data, it has become more important to provide systematic, quality annotations. The Gene Ontology (GO) project is the largest resource for cataloguing gene function. Nonetheless, its use is not yet ubiquitous and is still fraught with pitfalls. In this review, we provide a short primer to the GO for bioinformaticians. We summarize important aspects of the structure of the ontology, describe sources and types of functional annotations, survey measures of GO annotation similarity, review typical uses of GO and discuss other important considerations pertaining to the use of GO in bioinformatics applications. PMID:21330331
Chen, Wenan; McDonnell, Shannon K; Thibodeau, Stephen N; Tillmans, Lori S; Schaid, Daniel J
2016-11-01
Functional annotations have been shown to improve both the discovery power and fine-mapping accuracy in genome-wide association studies. However, the optimal strategy to incorporate the large number of existing annotations is still not clear. In this study, we propose a Bayesian framework to incorporate functional annotations in a systematic manner. We compute the maximum a posteriori solution and use cross validation to find the optimal penalty parameters. By extending our previous fine-mapping method CAVIARBF into this framework, we require only summary statistics as input. We also derived an exact calculation of Bayes factors using summary statistics for quantitative traits, which is necessary when a large proportion of trait variance is explained by the variants of interest, such as in fine mapping expression quantitative trait loci (eQTL). We compared the proposed method with PAINTOR using different strategies to combine annotations. Simulation results show that the proposed method achieves the best accuracy in identifying causal variants among the different strategies and methods compared. We also find that for annotations with moderate effects from a large annotation pool, screening annotations individually and then combining the top annotations can produce overly optimistic results. We applied these methods on two real data sets: a meta-analysis result of lipid traits and a cis-eQTL study of normal prostate tissues. For the eQTL data, incorporating annotations significantly increased the number of potential causal variants with high probabilities. Copyright © 2016 by the Genetics Society of America.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Cario, Clinton L; Witte, John S
2018-03-15
As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks. We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier. Orchid and our annotated tumor mutation database are freely available at https://github.com/wittelab/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution. JWitte@ucsf.edu. Supplementary data are available at Bioinformatics online.
Chimera: a Bioconductor package for secondary analysis of fusion products.
Beccuti, Marco; Carrara, Matteo; Cordero, Francesca; Lazzarato, Fulvio; Donatelli, Susanna; Nadalin, Francesca; Policriti, Alberto; Calogero, Raffaele A
2014-12-15
Chimera is a Bioconductor package that organizes, annotates, analyses and validates fusions reported by different fusion detection tools; current implementation can deal with output from bellerophontes, chimeraScan, deFuse, fusionCatcher, FusionFinder, FusionHunter, FusionMap, mapSplice, Rsubread, tophat-fusion and STAR. The core of Chimera is a fusion data structure that can store fusion events detected with any of the aforementioned tools. Fusions are then easily manipulated with standard R functions or through the set of functionalities specifically developed in Chimera with the aim of supporting the user in managing fusions and discriminating false-positive results. © The Author 2014. Published by Oxford University Press.
RNAbrowse: RNA-Seq De Novo Assembly Results Browser
Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe
2014-01-01
Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse. PMID:24823498
Correcting systematic bias and instrument measurement drift with mzRefinery
Gibbons, Bryson C.; Chambers, Matthew C.; Monroe, Matthew E.; ...
2015-08-04
Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments. We introduce the mzRefinery tool into the ProteoWizard package for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at zero ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats. In conclusion, we report on availability; the mzRefinery tool is part of msConvert, availablemore » with the ProteoWizard open source package at http://proteowizard.sourceforge.net/« less
Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes.
McKain, Michael R; Hartsock, Ryan H; Wohl, Molly M; Kellogg, Elizabeth A
2017-01-01
Chloroplast genomes are now produced in the hundreds for angiosperm phylogenetics projects, but current methods for annotation, alignment and tree estimation still require some manual intervention reducing throughput and increasing analysis time for large chloroplast systematics projects. Verdant is a web-based software suite and database built to take advantage a novel annotation program, annoBTD. Using annoBTD, Verdant provides accurate annotation of chloroplast genomes without manual intervention. Subsequent alignment and tree estimation can incorporate newly annotated and publically available plastomes and can accommodate a large number of taxa. Verdant sharply reduces the time required for analysis of assembled chloroplast genomes and removes the need for pipelines and software on personal hardware. Verdant is available at: http://verdant.iplantcollaborative.org/plastidDB/ It is implemented in PHP, Perl, MySQL, Javascript, HTML and CSS with all major browsers supported. mrmckain@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Fast gene ontology based clustering for microarray experiments.
Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa
2008-11-21
Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
NASA Astrophysics Data System (ADS)
van Noort, Thomas; Achten, Peter; Plasmeijer, Rinus
We present a typical synergy between dynamic types (dynamics) and generalised algebraic datatypes (GADTs). The former provides a clean approach to integrating dynamic typing in a statically typed language. It allows values to be wrapped together with their type in a uniform package, deferring type unification until run time using a pattern match annotated with the desired type. The latter allows for the explicit specification of constructor types, as to enforce their structural validity. In contrast to ADTs, GADTs are heterogeneous structures since each constructor type is implicitly universally quantified. Unfortunately, pattern matching only enforces structural validity and does not provide instantiation information on polymorphic types. Consequently, functions that manipulate such values, such as a type-safe update function, are cumbersome due to boilerplate type representation administration. In this paper we focus on improving such functions by providing a new GADT annotation via a natural synergy with dynamics. We formally define the semantics of the annotation and touch on novel other applications of this technique such as type dispatching and enforcing type equality invariants on GADT values.
The Role of Packaging in Solid Waste Management 1966 to 1976.
ERIC Educational Resources Information Center
Darnay, Arsen; Franklin, William E.
The goals of waste processors and packagers obviously differ: the packaging industry seeks durable container material that will be unimpaired by external factors. Until recently, no systematic analysis of the relationship between packaging and solid waste disposal had been undertaken. This three-part document defines these interactions, and the…
obitools: a unix-inspired software package for DNA metabarcoding.
Boyer, Frédéric; Mercier, Céline; Bonin, Aurélie; Le Bras, Yvan; Taberlet, Pierre; Coissac, Eric
2016-01-01
DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org. © 2015 John Wiley & Sons Ltd.
Ralph, Duncan K; Matsen, Frederick A
2016-01-01
VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM "factorization" strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
The graphics and data acquisition software package
NASA Technical Reports Server (NTRS)
Crosier, W. G.
1981-01-01
A software package was developed for use with micro and minicomputers, particularly the LSI-11/DPD-11 series. The package has a number of Fortran-callable subroutines which perform a variety of frequently needed tasks for biomedical applications. All routines are well documented, flexible, easy to use and modify, and require minimal programmer knowledge of peripheral hardware. The package is also economical of memory and CPU time. A single subroutine call can perform any one of the following functions: (1) plot an array of integer values from sampled A/D data, (2) plot an array of Y values versus an array of X values; (3) draw horizontal and/or vertical grid lines of selectable type; (4) annotate grid lines with user units; (5) get coordinates of user controlled crosshairs from the terminal for interactive graphics; (6) sample any analog channel with program selectable gain; (7) wait a specified time interval, and (8) perform random access I/O of one or more blocks of a sequential disk file. Several miscellaneous functions are also provided.
LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.
Ryan, Michael; Diekhans, Mark; Lien, Stephanie; Liu, Yun; Karchin, Rachel
2009-06-01
LS-SNP/PDB is a new WWW resource for genome-wide annotation of human non-synonymous (amino acid changing) SNPs. It serves high-quality protein graphics rendered with UCSF Chimera molecular visualization software. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features. LS-SNP/PDB is available at (http://ls-snp.icm.jhu.edu/ls-snp-pdb) and via links from protein data bank (PDB) biology and chemistry tabs, UCSC Genome Browser Gene Details and SNP Details pages and PharmGKB Gene Variants Downloads/Cross-References pages.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; Franklin, Lyndsey; Tratz, Stephen C.
2008-04-01
Frame Analysis has come to play an increasingly stronger role in the study of social movements in Sociology and Political Science. While significant steps have been made in providing a theory of frames and framing, a systematic characterization of the frame concept is still largely lacking and there are no rec-ognized criteria and methods that can be used to identify and marshal frame evi-dence reliably and in a time and cost effective manner. Consequently, current Frame Analysis work is still too reliant on manual annotation and subjective inter-pretation. The goal of this paper is to present an approach to themore » representation, acquisition and analysis of frame evidence which leverages Content Analysis, In-formation Extraction and Semantic Search methods to provide a systematic treat-ment of a Frame Analysis and automate frame annotation.« less
Social networks to biological networks: systems biology of Mycobacterium tuberculosis.
Vashisht, Rohit; Bhardwaj, Anshu; Osdd Consortium; Brahmachari, Samir K
2013-07-01
Contextualizing relevant information to construct a network that represents a given biological process presents a fundamental challenge in the network science of biology. The quality of network for the organism of interest is critically dependent on the extent of functional annotation of its genome. Mostly the automated annotation pipelines do not account for unstructured information present in volumes of literature and hence large fraction of genome remains poorly annotated. However, if used, this information could substantially enhance the functional annotation of a genome, aiding the development of a more comprehensive network. Mining unstructured information buried in volumes of literature often requires manual intervention to a great extent and thus becomes a bottleneck for most of the automated pipelines. In this review, we discuss the potential of scientific social networking as a solution for systematic manual mining of data. Focusing on Mycobacterium tuberculosis, as a case study, we discuss our open innovative approach for the functional annotation of its genome. Furthermore, we highlight the strength of such collated structured data in the context of drug target prediction based on systems level analysis of pathogen.
Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E
2016-03-11
Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.
Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng
2015-06-15
Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. The software is available upon request. © The Author 2015. Published by Oxford University Press.
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence
Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng
2015-01-01
Motivation: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. Methods: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using ‘learning to rank’. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. Results: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. Availability and implementation: The software is available upon request. Contact: zhusf@fudan.edu.cn PMID:26072501
Hatchard, Jenny L; Fooks, Gary J; Evans-Reeves, Karen A; Ulucanlar, Selda; Gilmore, Anna B
2014-02-12
To examine the volume, relevance and quality of transnational tobacco corporations' (TTCs) evidence that standardised packaging of tobacco products 'won't work', following the UK government's decision to 'wait and see' until further evidence is available. Content analysis. We analysed the evidence cited in submissions by the UK's four largest TTCs to the UK Department of Health consultation on standardised packaging in 2012. The volume, relevance (subject matter) and quality (as measured by independence from industry and peer-review) of evidence cited by TTCs was compared with evidence from a systematic review of standardised packaging . Fisher's exact test was used to assess differences in the quality of TTC and systematic review evidence. 100% of the data were second-coded to validate the findings: 94.7% intercoder reliability; all differences were resolved. 77/143 pieces of TTC-cited evidence were used to promote their claim that standardised packaging 'won't work'. Of these, just 17/77 addressed standardised packaging: 14 were industry connected and none were published in peer-reviewed journals. Comparison of TTC and systematic review evidence on standardised packaging showed that the industry evidence was of significantly lower quality in terms of tobacco industry connections and peer-review (p<0.0001). The most relevant TTC evidence (on standardised packaging or packaging generally, n=26) was of significantly lower quality (p<0.0001) than the least relevant (on other topics, n=51). Across the dataset, TTC-connected evidence was significantly less likely to be published in a peer-reviewed journal (p=0.0045). With few exceptions, evidence cited by TTCs to promote their claim that standardised packaging 'won't work' lacks either policy relevance or key indicators of quality. Policymakers could use these three criteria-subject matter, independence and peer-review status-to critically assess evidence submitted to them by corporate interests via Better Regulation processes.
Hatchard, Jenny L; Fooks, Gary J; Evans-Reeves, Karen A; Ulucanlar, Selda; Gilmore, Anna B
2014-01-01
Objectives To examine the volume, relevance and quality of transnational tobacco corporations’ (TTCs) evidence that standardised packaging of tobacco products ‘won't work’, following the UK government's decision to ‘wait and see’ until further evidence is available. Design Content analysis. Setting We analysed the evidence cited in submissions by the UK's four largest TTCs to the UK Department of Health consultation on standardised packaging in 2012. Outcome measures The volume, relevance (subject matter) and quality (as measured by independence from industry and peer-review) of evidence cited by TTCs was compared with evidence from a systematic review of standardised packaging . Fisher's exact test was used to assess differences in the quality of TTC and systematic review evidence. 100% of the data were second-coded to validate the findings: 94.7% intercoder reliability; all differences were resolved. Results 77/143 pieces of TTC-cited evidence were used to promote their claim that standardised packaging ‘won't work’. Of these, just 17/77 addressed standardised packaging: 14 were industry connected and none were published in peer-reviewed journals. Comparison of TTC and systematic review evidence on standardised packaging showed that the industry evidence was of significantly lower quality in terms of tobacco industry connections and peer-review (p<0.0001). The most relevant TTC evidence (on standardised packaging or packaging generally, n=26) was of significantly lower quality (p<0.0001) than the least relevant (on other topics, n=51). Across the dataset, TTC-connected evidence was significantly less likely to be published in a peer-reviewed journal (p=0.0045). Conclusions With few exceptions, evidence cited by TTCs to promote their claim that standardised packaging ‘won't work’ lacks either policy relevance or key indicators of quality. Policymakers could use these three criteria—subject matter, independence and peer-review status—to critically assess evidence submitted to them by corporate interests via Better Regulation processes. PMID:24523419
MIPS: analysis and annotation of proteins from whole genomes
Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
MIPS: analysis and annotation of proteins from whole genomes.
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
2004-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Dugas, Martin; Dugas-Breit, Susanne
2014-01-01
Design, execution and analysis of clinical studies involves several stakeholders with different professional backgrounds. Typically, principle investigators are familiar with standard office tools, data managers apply electronic data capture (EDC) systems and statisticians work with statistics software. Case report forms (CRFs) specify the data model of study subjects, evolve over time and consist of hundreds to thousands of data items per study. To avoid erroneous manual transformation work, a converting tool for different representations of study data models was designed. It can convert between office format, EDC and statistics format. In addition, it supports semantic annotations, which enable precise definitions for data items. A reference implementation is available as open source package ODMconverter at http://cran.r-project.org.
MicroScope: a platform for microbial genome annotation and comparative genomics
Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493
MicroScope: a platform for microbial genome annotation and comparative genomics.
Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.
DWARF – a data warehouse system for analyzing protein families
Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen
2006-01-01
Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
Sharma, Virag; Hiller, Michael
2017-08-21
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
ERIC Educational Resources Information Center
Logan, Robert S.
The authoring process and authoring aids which facilitate development of instructional materials have recently emerged as an area of concern in the field of instructional systems development (ISD). This process includes information gathering, its conversion to learning packages, its revision, and its formal publication. The purpose of this…
FOUNTAIN: A JAVA open-source package to assist large sequencing projects
Buerstedde, Jean-Marie; Prill, Florian
2001-01-01
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
Discourse Analysis in Stylistics and Literature Instruction.
ERIC Educational Resources Information Center
Short, Mick
1990-01-01
A review of research regarding discourse analysis in stylistics and literature instruction covers studies of text, systematic analysis, meaning, style, literature pedagogy, and applied linguistics. A 10-citation annotated bibliography and a larger unannotated bibliography are included. (CB)
Assessing Strength of Evidence of Appropriate Use Criteria for Diagnostic Imaging Examinations.
Lacson, Ronilda; Raja, Ali S; Osterbur, David; Ip, Ivan; Schneider, Louise; Bain, Paul; Mita, Carol; Whelan, Julia; Silveira, Patricia; Dement, David; Khorasani, Ramin
2016-05-01
For health information technology tools to fully inform evidence-based decisions, recommendations must be reliably assessed for quality and strength of evidence. We aimed to create an annotation framework for grading recommendations regarding appropriate use of diagnostic imaging examinations. The annotation framework was created by an expert panel (clinicians in three medical specialties, medical librarians, and biomedical scientists) who developed a process for achieving consensus in assessing recommendations, and evaluated by measuring agreement in grading the strength of evidence for 120 empirically selected recommendations using the Oxford Levels of Evidence. Eighty-two percent of recommendations were assigned to Level 5 (expert opinion). Inter-annotator agreement was 0.70 on initial grading (κ = 0.35, 95% CI, 0.23-0.48). After systematic discussion utilizing the annotation framework, agreement increased significantly to 0.97 (κ = 0.88, 95% CI, 0.77-0.99). A novel annotation framework was effective for grading the strength of evidence supporting appropriate use criteria for diagnostic imaging exams. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Readability of medicinal package leaflets: a systematic review
Pires, Carla; Vigário, Marina; Cavaco, Afonso
2015-01-01
OBJECTIVE To review studies on the readability of package leaflets of medicinal products for human use. METHODS We conducted a systematic literature review between 2008 and 2013 using the keywords “Readability and Package Leaflet” and “Readability and Package Insert” in the academic search engine Biblioteca do Conhecimento Online, comprising different bibliographic resources/databases. The preferred reporting items for systematic reviews and meta-analyses criteria were applied to prepare the draft of the report. Quantitative and qualitative original studies were included. Opinion or review studies not written in English, Portuguese, Italian, French, or Spanish were excluded. RESULTS We identified 202 studies, of which 180 were excluded and 22 were enrolled [two enrolling healthcare professionals, 10 enrolling other type of participants (including patients), three focused on adverse reactions, and 7 descriptive studies]. The package leaflets presented various readability problems, such as complex and difficult to understand texts, small font size, or few illustrations. The main methods to assess the readability of the package leaflet were usability tests or legibility formulae. Limitations with these methods included reduced number of participants; lack of readability formulas specifically validated for specific languages (e.g., Portuguese); and absence of an assessment on patients literacy, health knowledge, cognitive skills, levels of satisfaction, and opinions. CONCLUSIONS Overall, the package leaflets presented various readability problems. In this review, some methodological limitations were identified, including the participation of a limited number of patients and healthcare professionals, the absence of prior assessments of participant literacy, humor or sense of satisfaction, or the predominance of studies not based on role-plays about the use of medicines. These limitations should be avoided in future studies and be considered when interpreting the results. PMID:25741660
Readability of medicinal package leaflets: a systematic review.
Pires, Carla; Vigário, Marina; Cavaco, Afonso
2015-01-01
OBJECTIVE To review studies on the readability of package leaflets of medicinal products for human use. METHODS We conducted a systematic literature review between 2008 and 2013 using the keywords "Readability and Package Leaflet" and "Readability and Package Insert" in the academic search engine Biblioteca do Conhecimento Online, comprising different bibliographic resources/databases. The preferred reporting items for systematic reviews and meta-analyses criteria were applied to prepare the draft of the report. Quantitative and qualitative original studies were included. Opinion or review studies not written in English, Portuguese, Italian, French, or Spanish were excluded. RESULTS We identified 202 studies, of which 180 were excluded and 22 were enrolled [two enrolling healthcare professionals, 10 enrolling other type of participants (including patients), three focused on adverse reactions, and 7 descriptive studies]. The package leaflets presented various readability problems, such as complex and difficult to understand texts, small font size, or few illustrations. The main methods to assess the readability of the package leaflet were usability tests or legibility formulae. Limitations with these methods included reduced number of participants; lack of readability formulas specifically validated for specific languages (e.g., Portuguese); and absence of an assessment on patients literacy, health knowledge, cognitive skills, levels of satisfaction, and opinions. CONCLUSIONS Overall, the package leaflets presented various readability problems. In this review, some methodological limitations were identified, including the participation of a limited number of patients and healthcare professionals, the absence of prior assessments of participant literacy, humor or sense of satisfaction, or the predominance of studies not based on role-plays about the use of medicines. These limitations should be avoided in future studies and be considered when interpreting the results.
NMRbox: A Resource for Biomolecular NMR Computation.
Maciejewski, Mark W; Schuyler, Adam D; Gryk, Michael R; Moraru, Ion I; Romero, Pedro R; Ulrich, Eldon L; Eghbalnia, Hamid R; Livny, Miron; Delaglio, Frank; Hoch, Jeffrey C
2017-04-25
Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users. Copyright © 2017 Biophysical Society. All rights reserved.
MIPS: a database for genomes and protein sequences
Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.
Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.
Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James
2018-02-01
Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).
2014-01-01
Background This was a systematic review of the literature in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. Evidence mapping was used to reveal the effect of drug reminder packaging on medication adherence, to identify research gaps and to make suggestions for future research. Methods PubMed, Embase, CINAHL and PsycINFO were searched with an end date of September 2013 using the Medical Subject Headings (MeSH) term ‘medication adherence’ and 20 different search terms for ‘drug reminder packaging’, limited to the English and German languages. Additional references were identified through cross-referencing. All prospective controlled trials with an intervention using drug reminder packaging for patients taking at least one medication without the assistance of a health-care professional were included in the evidence mapping of the effect of drug reminder packaging on adherence and outcomes according to the Economic, Clinical and Humanistic Outcomes (ECHO) model. Results A total of 30 studies met the inclusion criteria: 10 randomized controlled trials, 19 controlled clinical trials and 1 cohort study. Drug reminder packaging had a significant effect on at least one adherence parameter in 17 studies (57%). The methodological quality was strong in five studies. Two studies provided complete information. Clear research gaps emerged. Conclusions Overall, the studies showed a positive effect of drug reminder packaging on adherence and clinical outcomes. However, poor reporting and important gaps like missing humanistic and economic outcomes and neglected safety issues limit the drawing of firm conclusions. Suggestions are made for future research. PMID:24661495
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.
Powell, Bradford C; Hutchison, Clyde A
2006-01-19
Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs
Powell, Bradford C; Hutchison, Clyde A
2006-01-01
Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
Efficient population-scale variant analysis and prioritization with VAPr.
Birmingham, Amanda; Mark, Adam M; Mazzaferro, Carlo; Xu, Guorong; Fisch, Kathleen M
2018-04-06
With the growing availability of population-scale whole-exome and whole-genome sequencing, demand for reproducible, scalable variant analysis has spread within genomic research communities. To address this need, we introduce the Python package VAPr (Variant Analysis and Prioritization). VAPr leverages existing annotation tools ANNOVAR and MyVariant.info with MongoDB-based flexible storage and filtering functionality. It offers biologists and bioinformatics generalists easy-to-use and scalable analysis and prioritization of genomic variants from large cohort studies. VAPr is developed in Python and is available for free use and extension under the MIT License. An install package is available on PyPi at https://pypi.python.org/pypi/VAPr, while source code and extensive documentation are on GitHub at https://github.com/ucsd-ccbb/VAPr. kfisch@ucsd.edu.
cyvcf2: fast, flexible variant analysis with Python.
Pedersen, Brent S; Quinlan, Aaron R
2017-06-15
Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. bpederse@gmail.com or aaronquinlan@gmail.com. cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/. © The Author 2017. Published by Oxford University Press.
Yang, Qian; Wang, Shuyuan; Dai, Enyu; Zhou, Shunheng; Liu, Dianming; Liu, Haizhou; Meng, Qianqian; Jiang, Bin; Jiang, Wei
2017-08-16
Pathway enrichment analysis has been widely used to identify cancer risk pathways, and contributes to elucidating the mechanism of tumorigenesis. However, most of the existing approaches use the outdated pathway information and neglect the complex gene interactions in pathway. Here, we first reviewed the existing widely used pathway enrichment analysis approaches briefly, and then, we proposed a novel topology-based pathway enrichment analysis (TPEA) method, which integrated topological properties and global upstream/downstream positions of genes in pathways. We compared TPEA with four widely used pathway enrichment analysis tools, including database for annotation, visualization and integrated discovery (DAVID), gene set enrichment analysis (GSEA), centrality-based pathway enrichment (CePa) and signaling pathway impact analysis (SPIA), through analyzing six gene expression profiles of three tumor types (colorectal cancer, thyroid cancer and endometrial cancer). As a result, we identified several well-known cancer risk pathways that could not be obtained by the existing tools, and the results of TPEA were more stable than that of the other tools in analyzing different data sets of the same cancer. Ultimately, we developed an R package to implement TPEA, which could online update KEGG pathway information and is available at the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/TPEA/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations
Paila, Umadevi; Chapman, Brad A.; Kirchner, Rory; Quinlan, Aaron R.
2013-01-01
Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics. PMID:23874191
Annotation an effective device for student feedback: a critical review of the literature.
Ball, Elaine C
2010-05-01
The paper examines hand-written annotation, its many features, difficulties and strengths as a feedback tool. It extends and clarifies what modest evidence is in the public domain and offers an evaluation of how to use annotation effectively in the support of student feedback [Marshall, C.M., 1998a. The Future of Annotation in a Digital (paper) World. Presented at the 35th Annual GLSLIS Clinic: Successes and Failures of Digital Libraries, June 20-24, University of Illinois at Urbana-Champaign, March 24, pp. 1-20; Marshall, C.M., 1998b. Toward an ecology of hypertext annotation. Hypertext. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, June 20-24, Pittsburgh Pennsylvania, US, pp. 40-49; Wolfe, J.L., Nuewirth, C.M., 2001. From the margins to the centre: the future of annotation. Journal of Business and Technical Communication, 15(3), 333-371; Diyanni, R., 2002. One Hundred Great Essays. Addison-Wesley, New York; Wolfe, J.L., 2002. Marginal pedagogy: how annotated texts affect writing-from-source texts. Written Communication, 19(2), 297-333; Liu, K., 2006. Annotation as an index to critical writing. Urban Education, 41, 192-207; Feito, A., Donahue, P., 2008. Minding the gap annotation as preparation for discussion. Arts and Humanities in Higher Education, 7(3), 295-307; Ball, E., 2009. A participatory action research study on handwritten annotation feedback and its impact on staff and students. Systemic Practice and Action Research, 22(2), 111-124; Ball, E., Franks, H., McGrath, M., Leigh, J., 2009. Annotation is a valuable tool to enhance learning and assessment in student essays. Nurse Education Today, 29(3), 284-291]. Although a significant number of studies examine annotation, this is largely related to on-line tools and computer mediated communication and not hand-written annotation as comment, phrase or sign written on the student essay to provide critique. Little systematic research has been conducted to consider how this latter form of annotation influences student learning and assessment or, indeed, helps tutors to employ better annotative practices [Juwah, C., Macfarlane-Dick, D., Matthew, B., Nicol, D., Ross, D., Smith, B., 2004. Enhancing student learning through effective formative feedback. The Higher Education Academy, 1-40; Jewitt, C., Kress, G., 2005. English in classrooms: only write down what you need to know: annotation for what? English in Education, 39(1), 5-18]. There is little evidence on ways to heighten students' self-awareness when their essays are returned with annotated feedback [Storch, N., Tapper, J., 1997. Student annotations: what NNS and NS university students say about their own writing. Journal of Second Language Writing, 6(3), 245-265]. The literature review clarifies forms of annotation as feedback practice and offers a summary of the challenges and usefulness of annotation. Copyright 2009. Published by Elsevier Ltd.
Rutllant, Josep
2016-01-01
Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value. PMID:27200191
Irizarry, Kristopher J L; Rutllant, Josep
2016-01-01
Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value.
He, Hongjuan; Xiu, Youcheng; Guo, Jing; Liu, Hui; Liu, Qi; Zeng, Tiebo; Chen, Yan; Zhang, Yan; Wu, Qiong
2013-01-01
Long non-coding RNAs (lncRNAs) as a key group of non-coding RNAs have gained widely attention. Though lncRNAs have been functionally annotated and systematic explored in higher mammals, few are under systematical identification and annotation. Owing to the expression specificity, known lncRNAs expressed in embryonic brain tissues remain still limited. Considering a large number of lncRNAs are only transcribed in brain tissues, studies of lncRNAs in developmental brain are therefore of special interest. Here, publicly available RNA-sequencing (RNA-seq) data in embryonic brain are integrated to identify thousands of embryonic brain lncRNAs by a customized pipeline. A significant proportion of novel transcripts have not been annotated by available genomic resources. The putative embryonic brain lncRNAs are shorter in length, less spliced and show less conservation than known genes. The expression of putative lncRNAs is in one tenth on average of known coding genes, while comparable with known lncRNAs. From chromatin data, putative embryonic brain lncRNAs are associated with active chromatin marks, comparable with known lncRNAs. Embryonic brain expressed lncRNAs are also indicated to have expression though not evident in adult brain. Gene Ontology analysis of putative embryonic brain lncRNAs suggests that they are associated with brain development. The putative lncRNAs are shown to be related to possible cis-regulatory roles in imprinting even themselves are deemed to be imprinted lncRNAs. Re-analysis of one knockdown data suggests that four regulators are associated with lncRNAs. Taken together, the identification and systematic analysis of putative lncRNAs would provide novel insights into uncharacterized mouse non-coding regions and the relationships with mammalian embryonic brain development. PMID:23967161
Toward automated biochemotype annotation for large compound libraries.
Chen, Xian; Liang, Yizeng; Xu, Jun
2006-08-01
Combinatorial chemistry allows scientists to probe large synthetically accessible chemical space. However, identifying the sub-space which is selectively associated with an interested biological target, is crucial to drug discovery and life sciences. This paper describes a process to automatically annotate biochemotypes of compounds in a library and thus to identify bioactivity related chemotypes (biochemotypes) from a large library of compounds. The process consists of two steps: (1) predicting all possible bioactivities for each compound in a library, and (2) deriving possible biochemotypes based on predictions. The Prediction of Activity Spectra for Substances program (PASS) was used in the first step. In second step, structural similarity and scaffold-hopping technologies are employed. These technologies are used to derive biochemotypes from bioactivity predictions and the corresponding annotated biochemotypes from MDL Drug Data Report (MDDR) database. About a one million (982,889) commercially available compound library (CACL) has been tested using this process. This paper demonstrates the feasibility of automatically annotating biochemotypes for large libraries of compounds. Nevertheless, some issues need to be considered in order to improve the process. First, the prediction accuracy of PASS program has no significant correlation with the number of compounds in a training set. Larger training sets do not necessarily increase the maximal error of prediction (MEP), nor do they increase the hit structural diversity. Smaller training sets do not necessarily decrease MEP, nor do they decrease the hit structural diversity. Second, the success of systematic bioactivity prediction relies on modeling, training data, and the definition of bioactivities (biochemotype ontology). Unfortunately, the biochemotype ontology was not well developed in the PASS program. Consequently, "ill-defined" bioactivities can reduce the quality of predictions. This paper suggests the ways in which the systematic bioactivities prediction program should be improved.
Situation Model for Situation-Aware Assistance of Dementia Patients in Outdoor Mobility
Yordanova, Kristina; Koldrack, Philipp; Heine, Christina; Henkel, Ron; Martin, Mike; Teipel, Stefan; Kirste, Thomas
2017-01-01
Background: Dementia impairs spatial orientation and route planning, thus often affecting the patient’s ability to move outdoors and maintain social activities. Situation-aware deliberative assistive technology devices (ATD) can substitute impaired cognitive function in order to maintain one’s level of social activity. To build such a system, one needs domain knowledge about the patient’s situation and needs. We call this collection of knowledge situation model. Objective: To construct a situation model for the outdoor mobility of people with dementia (PwD). The model serves two purposes: 1) as a knowledge base from which to build an ATD describing the mobility of PwD; and 2) as a codebook for the annotation of the recorded behavior. Methods: We perform systematic knowledge elicitation to obtain the relevant knowledge. The OBO Edit tool is used for implementing and validating the situation model. The model is evaluated by using it as a codebook for annotating the behavior of PwD during a mobility study and interrater agreement is computed. In addition, clinical experts perform manual evaluation and curation of the model. Results: The situation model consists of 101 concepts with 11 relation types between them. The results from the annotation showed substantial overlapping between two annotators (Cohen’s kappa of 0.61). Conclusion: The situation model is a first attempt to systematically collect and organize information related to the outdoor mobility of PwD for the purposes of situation-aware assistance. The model is the base for building an ATD able to provide situation-aware assistance and to potentially improve the quality of life of PwD. PMID:29060937
Williams, Ashley R.; Bain, Robert E. S.; Fisher, Michael B.; Cronk, Ryan; Kelly, Emma R.; Bartram, Jamie
2015-01-01
Background Packaged water products provide an increasingly important source of water for consumption. However, recent studies raise concerns over their safety. Objectives To assess the microbial safety of packaged water, examine differences between regions, country incomes, packaged water types, and compare packaged water with other water sources. Methods We performed a systematic review and meta-analysis. Articles published in English, French, Portuguese, Spanish and Turkish, with no date restrictions were identified from online databases and two previous reviews. Studies published before April 2014 that assessed packaged water for the presence of Escherichia coli, thermotolerant or total coliforms were included provided they tested at least ten samples or brands. Results A total of 170 studies were included in the review. The majority of studies did not detect fecal indicator bacteria in packaged water (78/141). Compared to packaged water from upper-middle and high-income countries, packaged water from low and lower-middle-income countries was 4.6 (95% CI: 2.6–8.1) and 13.6 (95% CI: 6.9–26.7) times more likely to contain fecal indicator bacteria and total coliforms, respectively. Compared to all other packaged water types, water from small bottles was less likely to be contaminated with fecal indicator bacteria (OR = 0.32, 95%CI: 0.17–0.58) and total coliforms (OR = 0.10, 95%CI: 0.05, 0.22). Packaged water was less likely to contain fecal indicator bacteria (OR = 0.35, 95%CI: 0.20, 0.62) compared to other water sources used for consumption. Conclusions Policymakers and regulators should recognize the potential benefits of packaged water in providing safer water for consumption at and away from home, especially for those who are otherwise unlikely to gain access to a reliable, safe water supply in the near future. To improve the quality of packaged water products they should be integrated into regulatory and monitoring frameworks. PMID:26505745
Williams, Ashley R; Bain, Robert E S; Fisher, Michael B; Cronk, Ryan; Kelly, Emma R; Bartram, Jamie
2015-01-01
Packaged water products provide an increasingly important source of water for consumption. However, recent studies raise concerns over their safety. To assess the microbial safety of packaged water, examine differences between regions, country incomes, packaged water types, and compare packaged water with other water sources. We performed a systematic review and meta-analysis. Articles published in English, French, Portuguese, Spanish and Turkish, with no date restrictions were identified from online databases and two previous reviews. Studies published before April 2014 that assessed packaged water for the presence of Escherichia coli, thermotolerant or total coliforms were included provided they tested at least ten samples or brands. A total of 170 studies were included in the review. The majority of studies did not detect fecal indicator bacteria in packaged water (78/141). Compared to packaged water from upper-middle and high-income countries, packaged water from low and lower-middle-income countries was 4.6 (95% CI: 2.6-8.1) and 13.6 (95% CI: 6.9-26.7) times more likely to contain fecal indicator bacteria and total coliforms, respectively. Compared to all other packaged water types, water from small bottles was less likely to be contaminated with fecal indicator bacteria (OR = 0.32, 95%CI: 0.17-0.58) and total coliforms (OR = 0.10, 95%CI: 0.05, 0.22). Packaged water was less likely to contain fecal indicator bacteria (OR = 0.35, 95%CI: 0.20, 0.62) compared to other water sources used for consumption. Policymakers and regulators should recognize the potential benefits of packaged water in providing safer water for consumption at and away from home, especially for those who are otherwise unlikely to gain access to a reliable, safe water supply in the near future. To improve the quality of packaged water products they should be integrated into regulatory and monitoring frameworks.
The translation factors of Drosophila melanogaster.
Marygold, Steven J; Attrill, Helen; Lasko, Paul
2017-01-02
Synthesis of polypeptides from mRNA (translation) is a fundamental cellular process that is coordinated and catalyzed by a set of canonical 'translation factors'. Surprisingly, the translation factors of Drosophila melanogaster have not yet been systematically identified, leading to inconsistencies in their nomenclature and shortcomings in functional (Gene Ontology, GO) annotations. Here, we describe the complete set of translation factors in D. melanogaster, applying nomenclature already in widespread use in other species, and revising their functional annotation. The collection comprises 43 initiation factors, 12 elongation factors, 3 release factors and 6 recycling factors, totaling 64 of which 55 are cytoplasmic and 9 are mitochondrial. We also provide an overview of notable findings and particular insights derived from Drosophila about these factors. This catalog, together with the incorporation of the improved nomenclature and GO annotation into FlyBase, will greatly facilitate access to information about the functional roles of these important proteins.
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.
Kishore, Kamal; de Pretis, Stefano; Lister, Ryan; Morelli, Marco J; Bianchi, Valerio; Amati, Bruno; Ecker, Joseph R; Pelizzola, Mattia
2015-09-29
Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets. To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks. methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.
CPLA 1.0: an integrated database of protein lysine acetylation.
Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu
2011-01-01
As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein-protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org.
CPLA 1.0: an integrated database of protein lysine acetylation
Liu, Zexian; Cao, Jun; Gao, Xinjiao; Zhou, Yanhong; Wen, Longping; Yang, Xiangjiao; Yao, Xuebiao; Ren, Jian; Xue, Yu
2011-01-01
As a reversible post-translational modification (PTM) discovered decades ago, protein lysine acetylation was known for its regulation of transcription through the modification of histones. Recent studies discovered that lysine acetylation targets broad substrates and especially plays an essential role in cellular metabolic regulation. Although acetylation is comparable with other major PTMs such as phosphorylation, an integrated resource still remains to be developed. In this work, we presented the compendium of protein lysine acetylation (CPLA) database for lysine acetylated substrates with their sites. From the scientific literature, we manually collected 7151 experimentally identified acetylation sites in 3311 targets. We statistically studied the regulatory roles of lysine acetylation by analyzing the Gene Ontology (GO) and InterPro annotations. Combined with protein–protein interaction information, we systematically discovered a potential human lysine acetylation network (HLAN) among histone acetyltransferases (HATs), substrates and histone deacetylases (HDACs). In particular, there are 1862 triplet relationships of HAT-substrate-HDAC retrieved from the HLAN, at least 13 of which were previously experimentally verified. The online services of CPLA database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). The CPLA database is freely available for all users at: http://cpla.biocuckoo.org. PMID:21059677
Computer Aided Design Parameters for Forward Basing
1988-12-01
21 meters. Systematic errors within limits stated for absolute accuracy are tolerated at this level. DEM data acquired photogrammetrically using manual ...This is a professional drawing package, 19 capable of the manipulation required for this project. With the AutoLISP programming language (a variation on...Table 2). 0 25 Data Conversion Package II GWN System’s Digital Terrain Modeling (DTM) package was used. This AutoLISP -based third party software is
CHROMA: consensus-based colouring of multiple alignments for publication.
Goodstadt, L; Ponting, C P
2001-09-01
CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
Towards Semantic Modelling of Business Processes for Networked Enterprises
NASA Astrophysics Data System (ADS)
Furdík, Karol; Mach, Marián; Sabol, Tomáš
The paper presents an approach to the semantic modelling and annotation of business processes and information resources, as it was designed within the FP7 ICT EU project SPIKE to support creation and maintenance of short-term business alliances and networked enterprises. A methodology for the development of the resource ontology, as a shareable knowledge model for semantic description of business processes, is proposed. Systematically collected user requirements, conceptual models implied by the selected implementation platform as well as available ontology resources and standards are employed in the ontology creation. The process of semantic annotation is described and illustrated using an example taken from a real application case.
The CHEMDNER corpus of chemicals and drugs and its annotation principles.
Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso
2015-01-01
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The CHEMDNER corpus of chemicals and drugs and its annotation principles
2015-01-01
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773
Analysis of Variance: What Is Your Statistical Software Actually Doing?
ERIC Educational Resources Information Center
Li, Jian; Lomax, Richard G.
2011-01-01
Users assume statistical software packages produce accurate results. In this article, the authors systematically examined Statistical Package for the Social Sciences (SPSS) and Statistical Analysis System (SAS) for 3 analysis of variance (ANOVA) designs, mixed-effects ANOVA, fixed-effects analysis of covariance (ANCOVA), and nested ANOVA. For each…
Systematic Interviewing Skills. Typescript Manual.
ERIC Educational Resources Information Center
Farley, Roy C.; Rubin, Stanford E.
Part of a five-part package (see note) of training materials to teach interviewing skills to human services personnel, this typescript manual is intended for use as a visual reference to aid in understanding the taped dialogues of the packages tape/slide demonstrations of interview interaction, and for referral in class discussions. The typescript…
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System.
Tay, Daniel Ming Ming; Govindarajan, Kunde Ramamoorthy; Khan, Asif M; Ong, Terenze Yao Rui; Samad, Hanif M; Soh, Wei Wei; Tong, Minyan; Zhang, Fan; Tan, Tin Wee
2010-10-15
Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments. Herein, we present T3SEdb http://effectors.bic.nus.edu.sg/T3SEdb, a specialized database of annotated T3SS effector (T3SE) sequences containing 1089 records from 46 bacterial species compiled from the literature and public protein databases. Procedures have been defined for i) comprehensive annotation of experimental status of effectors, ii) submission and curation review of records by users of the database, and iii) the regular update of T3SEdb existing and new records. Keyword fielded and sequence searches (BLAST, regular expression) are supported for both experimentally verified and hypothetical T3SEs. More than 171 clusters of T3SEs were detected based on sequence identity comparisons (intra-cluster difference up to ~60%). Owing to this high level of sequence diversity of T3SEs, the T3SEdb provides a large number of experimentally known effector sequences with wide species representation for creation of effector predictors. We created a reliable effector prediction tool, integrated into the database, to demonstrate the application of the database for such endeavours. T3SEdb is the first specialised database reported for T3SS effectors, enriched with manual annotations that facilitated systematic construction of a reliable prediction model for identification of novel effectors. The T3SEdb represents a platform for inclusion of additional annotations of metadata for future developments of sophisticated effector prediction models for screening and selection of putative novel effectors from bacterial genomes/proteomes that can be validated by a small number of key experiments.
Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David
2017-09-12
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.
Prediction of gene expression in embryonic structures of Drosophila melanogaster.
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-07-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster
Samsonova, Anastasia A; Niranjan, Mahesan; Russell, Steven; Brazma, Alvis
2007-01-01
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms. PMID:17658945
VRML and Collaborative Environments: New Tools for Networked Visualization
NASA Astrophysics Data System (ADS)
Crutcher, R. M.; Plante, R. L.; Rajlich, P.
We present two new applications that engage the network as a tool for astronomical research and/or education. The first is a VRML server which allows users over the Web to interactively create three-dimensional visualizations of FITS images contained in the NCSA Astronomy Digital Image Library (ADIL). The server's Web interface allows users to select images from the ADIL, fill in processing parameters, and create renderings featuring isosurfaces, slices, contours, and annotations; the often extensive computations are carried out on an NCSA SGI supercomputer server without the user having an individual account on the system. The user can then download the 3D visualizations as VRML files, which may be rotated and manipulated locally on virtually any class of computer. The second application is the ADILBrowser, a part of the NCSA Horizon Image Data Browser Java package. ADILBrowser allows a group of participants to browse images from the ADIL within a collaborative session. The collaborative environment is provided by the NCSA Habanero package which includes text and audio chat tools and a white board. The ADILBrowser is just an example of a collaborative tool that can be built with the Horizon and Habanero packages. The classes provided by these packages can be assembled to create custom collaborative applications that visualize data either from local disk or from anywhere on the network.
Semantic similarity analysis of protein data: assessment with biological features and issues.
Guzzi, Pietro H; Mina, Marco; Guerra, Concettina; Cannataro, Mario
2012-09-01
The integration of proteomics data with biological knowledge is a recent trend in bioinformatics. A lot of biological information is available and is spread on different sources and encoded in different ontologies (e.g. Gene Ontology). Annotating existing protein data with biological information may enable the use (and the development) of algorithms that use biological ontologies as framework to mine annotated data. Recently many methodologies and algorithms that use ontologies to extract knowledge from data, as well as to analyse ontologies themselves have been proposed and applied to other fields. Conversely, the use of such annotations for the analysis of protein data is a relatively novel research area that is currently becoming more and more central in research. Existing approaches span from the definition of the similarity among genes and proteins on the basis of the annotating terms, to the definition of novel algorithms that use such similarities for mining protein data on a proteome-wide scale. This work, after the definition of main concept of such analysis, presents a systematic discussion and comparison of main approaches. Finally, remaining challenges, as well as possible future directions of research are presented.
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-01
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. PMID:27899624
Education, Education and Indoctrination! Packaging Politics and the Three "Rs"
ERIC Educational Resources Information Center
Franklin, Bob
2004-01-01
This paper explores how recent Labour governments have tried systematically to package educational and other social policies for media presentation and public consumption. This concern has resulted in the criticism that Labour is concerned with policy presentation above content: strong on policy spin but weak on policy delivery. The first section…
Kim, Sun; Chatr-aryamontri, Andrew; Chang, Christie S.; Oughtred, Rose; Rust, Jennifer; Wilbur, W. John; Comeau, Donald C.; Dolinski, Kara; Tyers, Mike
2017-01-01
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report. Database URL: http://bioc.sourceforge.net/BioC-BioGRID.html PMID:28077563
esATAC: An Easy-to-use Systematic pipeline for ATAC-seq data analysis.
Wei, Zheng; Zhang, Wei; Fang, Huan; Li, Yanda; Wang, Xiaowo
2018-03-07
ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present "esATAC", a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data analysis. It covers essential steps for full analyzing procedure, including raw data processing, quality control and downstream statistical analysis such as peak calling, enrichment analysis and transcription factor footprinting. esATAC supports one command line execution for preset pipelines, and provides flexible interfaces for building customized pipelines. esATAC package is open source under the GPL-3.0 license. It is implemented in R and C ++. Source code and binaries for Linux, MAC OS X and Windows are available through Bioconductor https://www.bioconductor.org/packages/release/bioc/html/esATAC.html). xwwang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.
i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles.
Simillion, Cedric; Janssens, Koen; Sterck, Lieven; Van de Peer, Yves
2008-01-01
i-ADHoRe is a software tool that combines gene content and gene order information of homologous genomic segments into profiles to detect highly degenerated homology relations within and between genomes. The new version offers, besides a significant increase in performance, several optimizations to the algorithm, most importantly to the profile alignment routine. As a result, the annotations of multiple genomes, or parts thereof, can be fed simultaneously into the program, after which it will report all regions of homology, both within and between genomes. The i-ADHoRe 2.0 package contains the C++ source code for the main program as well as various Perl scripts and a fully documented Perl API to facilitate post-processing. The software runs on any Linux- or -UNIX based platform. The package is freely available for academic users and can be downloaded from http://bioinformatics.psb.ugent.be/
GST-PRIME: an algorithm for genome-wide primer design.
Leister, Dario; Varotto, Claudio
2007-01-01
The profiling of mRNA expression based on DNA arrays has become a powerful tool to study genome-wide transcription of genes in a number of organisms. GST-PRIME is a software package created to facilitate large-scale primer design for the amplification of probes to be immobilized on arrays for transcriptome analyses, even though it can be also applied in low-throughput approaches. GST-PRIME allows highly efficient, direct amplification of gene-sequence tags (GSTs) from genomic DNA (gDNA), starting from annotated genome or transcript sequences. GST-PRIME provides a customer-friendly platform for automatic primer design, and despite the relative simplicity of the algorithm, experimental tests in the model plant species Arabidopsis thaliana confirmed the reliability of the software. This chapter describes the algorithm used for primer design, its input and output files, and the installation of the standalone package and its use.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-03-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-01-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945
Hughes, Nicole; Arora, Monika; Grills, Nathan
2016-01-01
Objective To review the current literature around the potential impact, effectiveness and perceptions of plain packaging in low income settings. Method A systematic review of the literature. Data sources 9 databases (PubMed, Global Health, Social Policy and Practice, Applied Social Sciences Index and Abstracts (ASSIA), CINAHL, PsycINFO, British Library for Development Studies (BLDS), Global Health Library and Scopus) were searched. The terms used for searching combined terms for smoking and tobacco use with terms for plain packaging. Study selection Studies investigating the impact of plain packaging on the determinants of tobacco use, such as smoking behaviour, appeal, prominence, effectiveness of health warnings, response to plain packs, attitudes towards quitting or likelihood of smoking in low-income settings, were identified. Studies must have been published in English and be original research of any level of rigour. Data extraction Two independent reviewers assessed studies for inclusion and extracted data. Data synthesis The results were synthesised qualitatively, with themes grouped under four key headings: appeal and attractiveness; salience of health warnings and perceptions of harm; enjoyment and perceived taste ratings; and perceptions of the impact on tobacco usage behaviour. Results This review has identified four articles that met the inclusion criteria. Studies identified that tobacco products in plain packaging had less appeal than in branded packaging in low-income settings. Conclusions This review indicates that plain packaging appears to be successful in reducing appeal of smoking and packets, and supports the call for plain packaging to be widely implemented in conjunction with other tobacco control policies. However, there are considerable gaps in the amount of research conducted outside high-income countries. PMID:27000787
Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.
Ernst, Jason; Kellis, Manolis
2015-04-01
With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.
TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages
Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan
2016-01-01
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks. PMID:28232861
TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.
Silva, Tiago C; Colaprico, Antonio; Olsen, Catharina; D'Angelo, Fulvio; Bontempi, Gianluca; Ceccarelli, Michele; Noushmehr, Houtan
2016-01-01
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.
Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella
2014-01-01
Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726
ERIC Educational Resources Information Center
Pickett, Anna Lou, Ed.
This annotated bibliography serves as a resource for program administrators and staff developers to enable them to strengthen and develop systematic training programs for paraprofessional staff. "Training and Instructional Materials: Publications" lists approximately 30 items developed to prepare paraprofessional personnel to work in educational,…
An Annotated Bibliography of the Mosquitoes and Mosquito-Borne Diseases of Guam (Diptera: Culicidae)
1976-01-01
of elephantiasis , with 83 Americans and 28 natives admitted during the year with dengue fever, No cases of malaria were known to have originated on...group, p. 109. Mosquito Systematics Vol. 8(4) 1976 -3e *South Pacific Conmission. 1951. Conference of experts on filariasis and elephantiasis . So
Marine Flora and Fauna of the Northeastern United States. Sipuncula.
ERIC Educational Resources Information Center
Cutler, Edward B.
This report is part of a subseries entitled "Marine Flora and Fauna of the Northeastern United States" which is designed for use by biology students, biologists, biological oceanographers and informed laymen. Contents of this report include: (1) Introduction; (2) Key to Sipuncula (Peanut Worms); (3) Annotated Systematic List of Species;…
Student and Teacher Perspectives on a Close Reading Protocol
ERIC Educational Resources Information Center
Fisher, Douglas; Frey, Nancy
2014-01-01
Close reading is an instructional practice that has gained attention of late. It involves reading a complex text, annotation, and repeatedly reading to answer text-dependent questions. Although there are a number of recommendations for the use of close reading, there has not been a systematic analysis of student or teacher perceptions of this…
MIPS: a database for genomes and protein sequences.
Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D
1999-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
A Brief Report on the Effects of a Self-Management Treatment Package on Stereotypic Behavior
ERIC Educational Resources Information Center
Moore, Timothy R.
2009-01-01
We evaluated the effects of a self-management treatment package (SMTP) on the stereotypic behavior of an adolescent with Pervasive Developmental Disorder-Not Otherwise Specified (PDD-NOS). Latency to stereotypy was systematically increased in the training setting (academic) and the effectiveness of the SMTP was evaluated within a multiple-probe…
Performance assessment of small-package-class nonintrusive inspection systems
NASA Astrophysics Data System (ADS)
Spradling, Michael L.; Hyatt, Roger
1997-02-01
The DoD Counterdrug Technology Development Program has addressed the development and demonstration of technology to enhance nonintrusive inspection of small packages such as passenger baggage, commercially delivered parcels, and breakbulk cargo items. Within the past year they have supported several small package-class nonintrusive inspection system performance assessment activities. All performance assessment programs involved the use of a red/blue team concept and were conducted in accordance with approved assessment protocols. This paper presents a discussion related to the systematic performance assessment of small package-class nonintrusive inspection technologies, including transmission, backscatter and computed tomography x-ray imaging, and protocol-related considerations for the assessment of these systems.
Islam, Mohammad Tawhidul; Mohamedali, Abidali; Ahn, Seong Beom; Nawar, Ishmam; Baker, Mark S; Ranganathan, Shoba
2017-01-01
In the past decade, proteomics and mass spectrometry have taken tremendous strides forward, particularly in the life sciences, spurred on by rapid advances in technology resulting in generation and conglomeration of vast amounts of data. Though this has led to tremendous advancements in biology, the interpretation of the data poses serious challenges for many practitioners due to the immense size and complexity of the data. Furthermore, the lack of annotation means that a potential gold mine of relevant biological information may be hiding within this data. We present here a simple and intuitive workflow for the research community to investigate and mine this data, not only to extract relevant data but also to segregate usable, quality data to develop hypotheses for investigation and validation. We apply an MS evidence workflow for verifying peptides of proteins from one's own data as well as publicly available databases. We then integrate a suite of freely available bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology and biochemical pathways. We also provide an example of the functional annotation of missing proteins in human chromosome 7 data from the NeXtProt database, where no evidence is available at the proteomic, antibody, or structural levels. We give examples of protocols, tools and detailed flowcharts that can be extended or tailored to interpret and annotate the proteome of any novel organism.
Kuorwel, Kuorwel K; Cran, Marlene J; Sonneveld, Kees; Miltz, Joseph; Bigger, Stephen W
2011-04-01
Significant interest has emerged in the introduction of food packaging materials manufactured from biodegradable polymers that have the potential to reduce the environmental impacts associated with conventional packaging materials. Current technologies in active packaging enable effective antimicrobial (AM) packaging films to be prepared from biodegradable materials that have been modified and/or blended with different compatible materials and/or plasticisers. A wide range of AM films prepared from modified biodegradable materials have the potential to be used for packaging of various food products. This review examines biodegradable polymers derived from polysaccharides and protein-based materials for their potential use in packaging systems designed for the protection of food products from microbial contamination. A comprehensive table that systematically analyses and categorizes much of the current literature in this area is included in the review.
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-04
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Stead, Martine; Moodie, Crawford; Angus, Kathryn; Bauld, Linda; McNeill, Ann; Thomas, James; Hastings, Gerard; Hinds, Kate; O'Mara-Eves, Alison; Kwan, Irene; Purves, Richard I; Bryce, Stuart L
2013-01-01
Standardised or 'plain' tobacco packaging was introduced in Australia in December 2012 and is currently being considered in other countries. The primary objective of this systematic review was to locate, assess and synthesise published and grey literature relating to the potential impacts of standardised tobacco packaging as proposed by the guidelines for the international Framework Convention on Tobacco Control: reduced appeal, increased salience and effectiveness of health warnings, and more accurate perceptions of product strength and harm. Electronic databases were searched and researchers in the field were contacted to identify studies. Eligible studies were published or unpublished primary research of any design, issued since 1980 and concerning tobacco packaging. Twenty-five quantitative studies reported relevant outcomes and met the inclusion criteria. A narrative synthesis was conducted. Studies that explored the impact of package design on appeal consistently found that standardised packaging reduced the appeal of cigarettes and smoking, and was associated with perceived lower quality, poorer taste and less desirable smoker identities. Although findings were mixed, standardised packs tended to increase the salience and effectiveness of health warnings in terms of recall, attention, believability and seriousness, with effects being mediated by the warning size, type and position on pack. Pack colour was found to influence perceptions of product harm and strength, with darker coloured standardised packs generally perceived as containing stronger tasting and more harmful cigarettes than fully branded packs; lighter coloured standardised packs suggested weaker and less harmful cigarettes. Findings were largely consistent, irrespective of location and sample. The evidence strongly suggests that standardised packaging will reduce the appeal of packaging and of smoking in general; that it will go some way to reduce consumer misperceptions regarding product harm based upon package design; and will help make the legally required on-pack health warnings more salient.
Stead, Martine; Moodie, Crawford; Angus, Kathryn; Bauld, Linda; McNeill, Ann; Thomas, James; Hastings, Gerard; Hinds, Kate; O'Mara-Eves, Alison; Kwan, Irene; Purves, Richard I.; Bryce, Stuart L.
2013-01-01
Background and Objectives Standardised or ‘plain’ tobacco packaging was introduced in Australia in December 2012 and is currently being considered in other countries. The primary objective of this systematic review was to locate, assess and synthesise published and grey literature relating to the potential impacts of standardised tobacco packaging as proposed by the guidelines for the international Framework Convention on Tobacco Control: reduced appeal, increased salience and effectiveness of health warnings, and more accurate perceptions of product strength and harm. Methods Electronic databases were searched and researchers in the field were contacted to identify studies. Eligible studies were published or unpublished primary research of any design, issued since 1980 and concerning tobacco packaging. Twenty-five quantitative studies reported relevant outcomes and met the inclusion criteria. A narrative synthesis was conducted. Results Studies that explored the impact of package design on appeal consistently found that standardised packaging reduced the appeal of cigarettes and smoking, and was associated with perceived lower quality, poorer taste and less desirable smoker identities. Although findings were mixed, standardised packs tended to increase the salience and effectiveness of health warnings in terms of recall, attention, believability and seriousness, with effects being mediated by the warning size, type and position on pack. Pack colour was found to influence perceptions of product harm and strength, with darker coloured standardised packs generally perceived as containing stronger tasting and more harmful cigarettes than fully branded packs; lighter coloured standardised packs suggested weaker and less harmful cigarettes. Findings were largely consistent, irrespective of location and sample. Conclusions The evidence strongly suggests that standardised packaging will reduce the appeal of packaging and of smoking in general; that it will go some way to reduce consumer misperceptions regarding product harm based upon package design; and will help make the legally required on-pack health warnings more salient. PMID:24146791
Topic detection using paragraph vectors to support active learning in systematic reviews.
Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia
2016-08-01
Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision
Wallace, Byron C.; Kuiper, Joël; Sharma, Aakash; Zhu, Mingxi (Brian); Marshall, Iain J.
2016-01-01
Systematic reviews underpin Evidence Based Medicine (EBM) by addressing precise clinical questions via comprehensive synthesis of all relevant published evidence. Authors of systematic reviews typically define a Population/Problem, Intervention, Comparator, and Outcome (a PICO criteria) of interest, and then retrieve, appraise and synthesize results from all reports of clinical trials that meet these criteria. Identifying PICO elements in the full-texts of trial reports is thus a critical yet time-consuming step in the systematic review process. We seek to expedite evidence synthesis by developing machine learning models to automatically extract sentences from articles relevant to PICO elements. Collecting a large corpus of training data for this task would be prohibitively expensive. Therefore, we derive distant supervision (DS) with which to train models using previously conducted reviews. DS entails heuristically deriving ‘soft’ labels from an available structured resource. However, we have access only to unstructured, free-text summaries of PICO elements for corresponding articles; we must derive from these the desired sentence-level annotations. To this end, we propose a novel method – supervised distant supervision (SDS) – that uses a small amount of direct supervision to better exploit a large corpus of distantly labeled instances by learning to pseudo-annotate articles using the available DS. We show that this approach tends to outperform existing methods with respect to automated PICO extraction. PMID:27746703
Haunsberger, Stefan J; Connolly, Niamh M C; Prehn, Jochen H M
2017-02-15
The miRBase database is the central and official repository for miRNAs and the current release is miRBase version 21.0. Name changes in different miRBase releases cause inconsistencies in miRNA names from version to version. When working with only a small number of miRNAs the translation can be done manually. However, with large sets of miRNAs, the necessary correction of such inconsistencies becomes burdensome and error-prone. We developed miRNAmeConverter , available as a Bioconductor R package and web interface that addresses the challenges associated with mature miRNA name inconsistencies. The main algorithm implemented enables high-throughput automatic translation of species-independent mature miRNA names to user selected miRBase versions. The web interface enables users less familiar with R to translate miRNA names given in form of a list or embedded in text and download of the results. The miRNAmeConverter R package is open source under the Artistic-2.0 license. It is freely available from Bioconductor ( http://bioconductor.org/packages/miRNAmeConverter ). The web interface is based on R Shiny and can be accessed under the URL http://www.systemsmedicineireland.ie/tools/mirna-name-converter/ . The database that miRNAmeConverter depends on is provided by the annotation package miRBaseVersions.db and can be downloaded from Bioconductor ( http://bioconductor.org/packages/miRBaseVersions.db ). Minimum R version 3.3.0 is required. stefanhaunsberger@rcsi.ie. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Horsburgh, J. S.; Jones, A. S.
2016-12-01
Data and models used within the hydrologic science community are diverse. New research data and model repositories have succeeded in making data and models more accessible, but have been, in most cases, limited to particular types or classes of data or models and also lack the type of collaborative, and iterative functionality needed to enable shared data collection and modeling workflows. File sharing systems currently used within many scientific communities for private sharing of preliminary and intermediate data and modeling products do not support collaborative data capture, description, visualization, and annotation. More recently, hydrologic datasets and models have been cast as "social objects" that can be published, collaborated around, annotated, discovered, and accessed. Yet it can be difficult using existing software tools to achieve the kind of collaborative workflows and data/model reuse that many envision. HydroShare is a new, web-based system for sharing hydrologic data and models with specific functionality aimed at making collaboration easier and achieving new levels of interactive functionality and interoperability. Within HydroShare, we have developed new functionality for creating datasets, describing them with metadata, and sharing them with collaborators. HydroShare is enabled by a generic data model and content packaging scheme that supports describing and sharing diverse hydrologic datasets and models. Interoperability among the diverse types of data and models used by hydrologic scientists is achieved through the use of consistent storage, management, sharing, publication, and annotation within HydroShare. In this presentation, we highlight and demonstrate how the flexibility of HydroShare's data model and packaging scheme, HydroShare's access control and sharing functionality, and versioning and publication capabilities have enabled the sharing and publication of research datasets for a large, interdisciplinary water research project called iUTAH (innovative Urban Transitions and Aridregion Hydro-sustainability). We discuss the experiences of iUTAH researchers now using HydroShare to collaboratively create, curate, and publish datasets and models in a way that encourages collaboration, promotes reuse, and meets funding agency requirements.
WhopGenome: high-speed access to whole-genome variation and sequence data in R.
Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J
2015-02-01
The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lee, Donald W; Khavrutskii, Ilja V; Wallqvist, Anders; Bavari, Sina; Cooper, Christopher L; Chaudhury, Sidhartha
2016-01-01
The somatic diversity of antigen-recognizing B-cell receptors (BCRs) arises from Variable (V), Diversity (D), and Joining (J) (VDJ) recombination and somatic hypermutation (SHM) during B-cell development and affinity maturation. The VDJ junction of the BCR heavy chain forms the highly variable complementarity determining region 3 (CDR3), which plays a critical role in antigen specificity and binding affinity. Tracking the selection and mutation of the CDR3 can be useful in characterizing humoral responses to infection and vaccination. Although tens to hundreds of thousands of unique BCR genes within an expressed B-cell repertoire can now be resolved with high-throughput sequencing, tracking SHMs is still challenging because existing annotation methods are often limited by poor annotation coverage, inconsistent SHM identification across the VDJ junction, or lack of B-cell lineage data. Here, we present B-cell repertoire inductive lineage and immunosequence annotator (BRILIA), an algorithm that leverages repertoire-wide sequencing data to globally improve the VDJ annotation coverage, lineage tree assembly, and SHM identification. On benchmark tests against simulated human and mouse BCR repertoires, BRILIA correctly annotated germline and clonally expanded sequences with 94 and 70% accuracy, respectively, and it has a 90% SHM-positive prediction rate in the CDR3 of heavily mutated sequences; these are substantial improvements over existing methods. We used BRILIA to process BCR sequences obtained from splenic germinal center B cells extracted from C57BL/6 mice. BRILIA returned robust B-cell lineage trees and yielded SHM patterns that are consistent across the VDJ junction and agree with known biological mechanisms of SHM. By contrast, existing BCR annotation tools, which do not account for repertoire-wide clonal relationships, systematically underestimated both the size of clonally related B-cell clusters and yielded inconsistent SHM frequencies. We demonstrate BRILIA's utility in B-cell repertoire studies related to VDJ gene usage, mechanisms for adenosine mutations, and SHM hot spot motifs. Furthermore, we show that the complete gene usage annotation and SHM identification across the entire CDR3 are essential for studying the B-cell affinity maturation process through immunosequencing methods.
ERIC Educational Resources Information Center
Mims, Pamela J.; Lee, Angel; Browder, Diane M.; Zakas, Tracie-Lynn; Flynn, Susan
2012-01-01
This pilot study sought to develop and evaluate the use of a treatment package that included systematic and direct instruction on acquisition of literacy skills aligned with middle school English/Language Arts standards for students with moderate to severe disabilities, including autism. Participants included five teachers and 15 middle school…
NASA Technical Reports Server (NTRS)
1979-01-01
This specification establishes the natural and induced environments to which the power extension package may be exposed during ground operations and space operations with the shuttle system. Space induced environments are applicable at the Orbiter attach point interface location. All probable environments are systematically listed according to each ground and mission phase.
Parallel Climate Data Assimilation PSAS Package
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Chan, Clara; Gennery, Donald B.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512node Intel Paragon. The equation solver achieves a sustained 18 Gflops performance. As the results, we achieved an unprecedented 100-fold solution time reduction on the Intel Paragon parallel platform over the Cray C90. This not only meets and exceeds the DAO time requirements, but also significantly enlarges the window of exploration in climate data assimilations.
Heavner, Benjamin D.; Smallbone, Kieran; Price, Nathan D.; Walker, Larry P.
2013-01-01
Updates to maintain a state-of-the art reconstruction of the yeast metabolic network are essential to reflect our understanding of yeast metabolism and functional organization, to eliminate any inaccuracies identified in earlier iterations, to improve predictive accuracy and to continue to expand into novel subsystems to extend the comprehensiveness of the model. Here, we present version 6 of the consensus yeast metabolic network (Yeast 6) as an update to the community effort to computationally reconstruct the genome-scale metabolic network of Saccharomyces cerevisiae S288c. Yeast 6 comprises 1458 metabolites participating in 1888 reactions, which are annotated with 900 yeast genes encoding the catalyzing enzymes. Compared with Yeast 5, Yeast 6 demonstrates improved sensitivity, specificity and positive and negative predictive values for predicting gene essentiality in glucose-limited aerobic conditions when analyzed with flux balance analysis. Additionally, Yeast 6 improves the accuracy of predicting the likelihood that a mutation will cause auxotrophy. The network reconstruction is available as a Systems Biology Markup Language (SBML) file enriched with Minimium Information Requested in the Annotation of Biochemical Models (MIRIAM)-compliant annotations. Small- and macromolecules in the network are referenced to authoritative databases such as Uniprot or ChEBI. Molecules and reactions are also annotated with appropriate publications that contain supporting evidence. Yeast 6 is freely available at http://yeast.sf.net/ as three separate SBML files: a model using the SBML level 3 Flux Balance Constraint package, a model compatible with the MATLAB® COBRA Toolbox for backward compatibility and a reconstruction containing only reactions for which there is experimental evidence (without the non-biological reactions necessary for simulating growth). Database URL: http://yeast.sf.net/ PMID:23935056
NASA Technical Reports Server (NTRS)
Heck, M. L.; Findlay, J. T.; Compton, H. R.
1983-01-01
The Aerodynamic Coefficient Identification Package (ACIP) is an instrument consisting of body mounted linear accelerometers, rate gyros, and angular accelerometers for measuring the Space Shuttle vehicular dynamics. The high rate recorded data are utilized for postflight aerodynamic coefficient extraction studies. Although consistent with pre-mission accuracies specified by the manufacturer, the ACIP data were found to contain detectable levels of systematic error, primarily bias, as well as scale factor, static misalignment, and temperature dependent errors. This paper summarizes the technique whereby the systematic ACIP error sources were detected, identified, and calibrated with the use of recorded dynamic data from the low rate, highly accurate Inertial Measurement Units.
The effect of the labelled serving size on consumption: A systematic review.
Bucher, Tamara; Murawski, Beatrice; Duncanson, Kerith; Labbe, David; Van der Horst, Klazine
2018-06-01
Guidance for food consumption and portion control plays an important role in the global management of overweight and obesity. Carefully conceptualised serving size labelling can contribute to this guidance. However, little is known about the relationship between the information that is provided regarding serving sizes on food packages and levels of actual food consumption. The aim of this systematic review was to investigate how serving size information on food packages influences food consumption. We conducted a systematic review of the evidence published between 1980 and March 2018. Two reviewers screened titles and abstracts for relevance and assessed relevant articles for eligibility in full-text. Five studies were considered eligible for the systematic review. In three of the included studies, changes in serving size labelling resulted in positive health implications for consumers, whereby less discretionary foods were consumed, if serving sizes were smaller or if serving size information was provided alongside contextual information referring to the entire package. One study did not find significant differences between the conditions they tested and one study suggested a potentially negative impact, if the serving size was reduced. The influence of labelled serving size on consumption of non-discretionary foods remains unclear, which is partially due to the absence of studies specifically focusing on non-discretionary food groups. Studies that investigate the impact of serving size labels within the home environment and across a broad demographic cross-section are required. Copyright © 2018. Published by Elsevier Ltd.
Spooled packaging of shape memory alloy actuators
NASA Astrophysics Data System (ADS)
Redmond, John A.
A vast cross-section of transportation, manufacturing, consumer product, and medical technologies rely heavily on actuation. Accordingly, progress in these industries is often strongly coupled to the advancement of actuation technologies. As the field of actuation continues to evolve, smart materials show significant promise for satisfying the growing needs of industry. In particular, shape memory alloy (SMA) wire actuators present an opportunity for low-cost, high performance actuation, but until now, they have been limited or restricted from use in many otherwise suitable applications by the difficulty in packaging the SMA wires within tight or unusually shaped form constraints. To address this packaging problem, SMA wires can be spool-packaged by wrapping around mandrels to make the actuator more compact or by redirecting around multiple mandrels to customize SMA wire pathways to unusual form factors. The goal of this dissertation is to develop the scientific knowledge base for spooled packaging of low-cost SMA wire actuators that enables high, predictable performance within compact, customizable form factors. In developing the scientific knowledge base, this dissertation defines a systematic general representation of single and multiple mandrel spool-packaged SMA actuators and provides tools for their analysis, understanding, and synthesis. A quasi-static analytical model distills the underlying mechanics down to the three effects of friction, bending, and binding, which enables prediction of the behavior of generic spool-packaged SMA actuators with specifiable geometric, loading, frictional, and SMA material parameters. An extensive experimental and simulation-based parameter study establishes the necessary understanding of how primary design tradeoffs between performance, packaging, and cost are governed by the underlying mechanics of spooled actuators. A design methodology outlines a systematic approach to synthesizing high performance SMA wire actuators with mitigated material, power, and packaging costs and compact, customizable form factors. By examining the multi-faceted connections between performance, packaging, and cost, this dissertation builds a knowledge base that goes beyond implementing SMA actuators for particular applications. Rather, it provides a well-developed strategy for realizing the advantages of SMA actuation for a broadened range of applications, thereby enabling opportunities for new functionality and capabilities in industry.
A Systematic Approach for Understanding Slater-Gaussian Functions in Computational Chemistry
ERIC Educational Resources Information Center
Stewart, Brianna; Hylton, Derrick J.; Ravi, Natarajan
2013-01-01
A systematic way to understand the intricacies of quantum mechanical computations done by a software package known as "Gaussian" is undertaken via an undergraduate research project. These computations involve the evaluation of key parameters in a fitting procedure to express a Slater-type orbital (STO) function in terms of the linear…
A systematic review of substance misuse assessment packages.
Sweetman, Jennifer; Raistrick, Duncan; Mdege, Noreen D; Crosby, Helen
2013-07-01
Health-care systems globally are moving away from process measures of performance to payments for outcomes achieved. It follows that there is a need for a selection of proven quality tools that are suitable for undertaking comprehensive assessments and outcomes assessments. This review aimed to identify and evaluate existing comprehensive assessment packages. The work is part of a national program in the UK, Collaborations in Leadership of Applied Health Research and Care. Systematic searches were carried out across major databases to identify instruments designed to assess substance misuse. For those instruments identified, searches were carried out using the Cochrane Library, Embase, Ovid MEDLINE(®) and PsychINFO to identify articles reporting psychometric data. From 595 instruments, six met the inclusion criteria: Addiction Severity Index; Chemical Use, Abuse and Dependence Scale; Form 90; Maudsley Addiction Profile; Measurements in the Addictions for Triage and Evaluation; and Substance Abuse Outcomes Module. The most common reasons for exclusion were that instruments were: (i) designed for a specific substance (239); (ii) not designed for use in addiction settings (136); (iii) not providing comprehensive assessment (89); and (iv) not suitable as an outcome measure (20). The six packages are very different and suited to different uses. No package had adequate evaluation of their properties and so the emphasis should be on refining a small number of tools with very general application rather than creating new ones. An alternative to using 'off-the-shelf' packages is to create bespoke packages from well-validated, single-construct scales. [ © 2013 Australasian Professional Society on Alcohol and other Drugs.
Jo, Catherine L; Ambs, Anita; Dresler, Carolyn M; Backinger, Cathy L
2017-02-01
We aimed to investigate the effects of special packaging (child-resistant, adult-friendly) and tamper-resistant packaging on health and behavioral outcomes in order to identify research gaps and implications for packaging standards for tobacco products. We searched seven databases for keywords related to special and tamper-resistant packaging, consulted experts, and reviewed citations of potentially relevant studies. 733 unique papers were identified. Two coders independently screened each title and abstract for eligibility. They then reviewed the full text of the remaining papers for a second round of eligibility screening. Included studies investigated a causal relationship between type of packaging or packaging regulation and behavioral or health outcomes and had a study population composed of consumers. Studies were excluded on the basis of publication type, if they were not peer-reviewed, and if they had low external validity. Two reviewers independently coded each paper for study and methodological characteristics and limitations. Discrepancies were discussed and resolved. The review included eight studies: four assessing people's ability to access the contents of different packaging types and four evaluating the impact of packaging requirements on health-related outcomes. Child-resistant packaging was generally more difficult to open than non-child-resistant packaging. Child-resistant packaging requirements have been associated with reductions in child mortality. Child-resistant packaging holds the expectation to reduce tobacco product poisonings among children under six. Published by Elsevier Inc.
Jo, Catherine L.; Ambs, Anita; Dresler, Carolyn M.; Backinger, Cathy L.
2017-01-01
Objective We aimed to investigate the effects of special packaging (child-resistant, adult-friendly) and tamper-resistant packaging on health and behavioral outcomes in order to identify research gaps and implications for packaging standards for tobacco products. Methods We searched seven databases for keywords related to special and tamper-resistant packaging, consulted experts, and reviewed citations of potentially relevant studies. 733 unique papers were identified. Two coders independently screened each title and abstract for eligibility. They then reviewed the full text of the remaining papers for a second round of eligibility screening. Included studies investigated a causal relationship between type of packaging or packaging regulation and behavioral or health outcomes and had a study population composed of consumers. Studies were excluded on the basis of publication type, if they were not peer-reviewed, and if they had low external validity. Two reviewers independently coded each paper for study and methodological characteristics and limitations. Discrepancies were discussed and resolved. Results The review included eight studies: four assessing people’s ability to access the contents of different packaging types and four evaluating the impact of packaging requirements on health-related outcomes. Child-resistant packaging was generally more difficult to open than non-child-resistant packaging. Child-resistant packaging requirements have been associated with reductions in child mortality. Conclusions Child-resistant packaging holds the expectation to reduce tobacco product poisonings among children under six. PMID:27939602
A database of annotated tentative orthologs from crop abiotic stress transcripts.
Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A
2006-10-07
A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.
Using ontology-based annotation to profile disease research
Coulet, Adrien; LePendu, Paea; Shah, Nigam H
2012-01-01
Background Profiling the allocation and trend of research activity is of interest to funding agencies, administrators, and researchers. However, the lack of a common classification system hinders the comprehensive and systematic profiling of research activities. This study introduces ontology-based annotation as a method to overcome this difficulty. Analyzing over a decade of funding data and publication data, the trends of disease research are profiled across topics, across institutions, and over time. Results This study introduces and explores the notions of research sponsorship and allocation and shows that leaders of research activity can be identified within specific disease areas of interest, such as those with high mortality or high sponsorship. The funding profiles of disease topics readily cluster themselves in agreement with the ontology hierarchy and closely mirror the funding agency priorities. Finally, four temporal trends are identified among research topics. Conclusions This work utilizes disease ontology (DO)-based annotation to profile effectively the landscape of biomedical research activity. By using DO in this manner a use-case driven mechanism is also proposed to evaluate the utility of classification hierarchies. PMID:22494789
SATPdb: a database of structurally annotated therapeutic peptides
Singh, Sandeep; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Bhalla, Sherry; Usmani, Salman Sadullah; Gautam, Ankur; Tuknait, Abhishek; Agrawal, Piyush; Mathur, Deepika; Raghava, Gajendra P.S.
2016-01-01
SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics. PMID:26527728
Hughes, Nicole; Arora, Monika; Grills, Nathan
2016-03-21
To review the current literature around the potential impact, effectiveness and perceptions of plain packaging in low income settings. A systematic review of the literature. 9 databases (PubMed, Global Health, Social Policy and Practice, Applied Social Sciences Index and Abstracts (ASSIA), CINAHL, PsycINFO, British Library for Development Studies (BLDS), Global Health Library and Scopus) were searched. The terms used for searching combined terms for smoking and tobacco use with terms for plain packaging. Studies investigating the impact of plain packaging on the determinants of tobacco use, such as smoking behaviour, appeal, prominence, effectiveness of health warnings, response to plain packs, attitudes towards quitting or likelihood of smoking in low-income settings, were identified. Studies must have been published in English and be original research of any level of rigour. Two independent reviewers assessed studies for inclusion and extracted data. The results were synthesised qualitatively, with themes grouped under four key headings: appeal and attractiveness; salience of health warnings and perceptions of harm; enjoyment and perceived taste ratings; and perceptions of the impact on tobacco usage behaviour. This review has identified four articles that met the inclusion criteria. Studies identified that tobacco products in plain packaging had less appeal than in branded packaging in low-income settings. This review indicates that plain packaging appears to be successful in reducing appeal of smoking and packets, and supports the call for plain packaging to be widely implemented in conjunction with other tobacco control policies. However, there are considerable gaps in the amount of research conducted outside high-income countries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
ERIC Educational Resources Information Center
Houston Univ., TX. Energy Inst.
This publication is a systematic listing of energy education materials and reference sources suitable for use in elementary and secondary schools. Items in this volume, located through computer searches, were still available in May, 1978. This inventory of energy resource materials consists of three indexes: media, grade level, and subject. Each…
Manijak, Mieszko P; Nielsen, Henrik B
2011-06-11
Although, systematic analysis of gene annotation is a powerful tool for interpreting gene expression data, it sometimes is blurred by incomplete gene annotation, missing expression response of key genes and secondary gene expression responses. These shortcomings may be partially circumvented by instead matching gene expression signatures to signatures of other experiments. To facilitate this we present the Functional Association Response by Overlap (FARO) server, that match input signatures to a compendium of 242 gene expression signatures, extracted from more than 1700 Arabidopsis microarray experiments. Hereby we present a publicly available tool for robust characterization of Arabidopsis gene expression experiments which can point to similar experimental factors in other experiments. The server is available at http://www.cbs.dtu.dk/services/faro/.
Investigation of package sealing using organic adhesives
NASA Technical Reports Server (NTRS)
Perkins, K. L.; Licari, J. J.
1977-01-01
A systematic study was performed to evaluate the suitability of adhesives for sealing hybrid packages. Selected adhesives were screened on the basis of their ability to seal gold-plated Kovar butterfly-type packages that retain their seal integrity after individual exposures to increasingly severe temperature-humidity environments. Tests were also run using thermal shock, temperature cycling, mechanical shock and temperature aging. The four best adhesives were determined and further tested in a 60 C/98% RH environment and continuously monitored in regard to moisture content. Results are given, however, none of the tested adhesives passed all the tests.
Zedler, Barbara K; Kakad, Priyanka; Colilla, Susan; Murrelle, Lenn; Shah, Nirav R
2011-01-01
The therapeutic benefit of self-administered medications for long-term use is limited by an average 50% nonadherence rate. Patient forgetfulness is a common factor in unintentional nonadherence. Unit-of-use packaging that incorporates a simple day-and-date feature (calendar packaging) is designed to improve adherence by prompting patients to maintain the prescribed dosing schedule. To review systematically, in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, randomized controlled trial evidence of the adherence benefits and harms of calendar blister packaging (CBP) and calendar pill organizers (CPO) for self-administered, long-term medication use. Data sources included the MEDLINE and Web of Science and Cochrane Library databases from their inception to September 2010 and communication with researchers in the field. Key search terms included blister-calendar pack, blister pack, drug packaging, medication adherence, medication compliance, medication compliance devices, medication containers, medication organizers, multicompartment compliance aid, persistence, pill-box organizers, prescription refill, randomized controlled trials, and refill compliance. Selected studies had an English-language title; a randomized controlled design; medication packaged in CBP or CPO; a requirement of solid, oral medication self-administered daily for longer than 1 month in community-dwelling adults; and at least 1 quantitative outcome measure of adherence. Two reviewers extracted data independently on study design, sample size, type of intervention and control, and outcomes. Ten trials with a total of 1045 subjects met the inclusion criteria, and 9 also examined clinical outcomes (seizures, blood pressure, psychiatric symptoms) or health care resource utilization. Substantial heterogeneity among trials precluded meta-analysis. In 3 studies, calendar packaging was part of a multicomponent adherence intervention. Six of 10 trials reported higher adherence, but it was associated with clinically significant improvement in only 1 study: 50% decreased seizure frequency with a CPO-based, multicomponent intervention. No study reported sufficient information to examine conclusively potential harms related to calendar packaging. All trials had significant methodological limitations, such as inadequate randomization or blinding, or reported insufficient information regarding enrolled subjects and attrition, which resulted in a moderate-to-high risk of bias and, in 2 studies, unevaluable outcome data. Trials were generally short and sample sizes small, with heterogeneous adherence outcome measures. Calendar packaging, especially in combination with education and reminder strategies, may improve medication adherence. Methodological limitations preclude definitive conclusions about the effect size of adherence and clinical benefits or harms associated with CBP and CPO. High-quality trials of adequate size and duration are needed to assess the clinical effectiveness of such interventions. Copyright © 2011 Elsevier HS Journals, Inc. All rights reserved.
Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.
2012-01-01
Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921
ssbio: a Python framework for structural systems biology.
Mih, Nathan; Brunk, Elizabeth; Chen, Ke; Catoiu, Edward; Sastry, Anand; Kavvas, Erol; Monk, Jonathan M; Zhang, Zhen; Palsson, Bernhard O
2018-06-15
Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows. ssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/master?filepath=Binder.ipynb. Supplementary data are available at Bioinformatics online.
MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go.
Muth, Thilo; Kohrs, Fabian; Heyer, Robert; Benndorf, Dirk; Rapp, Erdmann; Reichl, Udo; Martens, Lennart; Renard, Bernhard Y
2018-01-02
Metaproteomics, the mass spectrometry-based analysis of proteins from multispecies samples faces severe challenges concerning data analysis and results interpretation. To overcome these shortcomings, we here introduce the MetaProteomeAnalyzer (MPA) Portable software. In contrast to the original server-based MPA application, this newly developed tool no longer requires computational expertise for installation and is now independent of any relational database system. In addition, MPA Portable now supports state-of-the-art database search engines and a convenient command line interface for high-performance data processing tasks. While search engine results can easily be combined to increase the protein identification yield, an additional two-step workflow is implemented to provide sufficient analysis resolution for further postprocessing steps, such as protein grouping as well as taxonomic and functional annotation. Our new application has been developed with a focus on intuitive usability, adherence to data standards, and adaptation to Web-based workflow platforms. The open source software package can be found at https://github.com/compomics/meta-proteome-analyzer .
ERIC Educational Resources Information Center
Blount, Yvette; Abedin, Babak; Vatanasakdakul, Savanid; Erfani, Seyedezahra
2016-01-01
This study investigates how an enterprise resource planning (ERP) software package SAP was integrated into the curriculum of an accounting information systems (AIS) course in an Australian university. Furthermore, the paper provides a systematic literature review of articles published between 1990 and 2013 to understand how ERP systems were…
Packaging Technology Designed, Fabricated, and Assembled for High-Temperature SiC Microsystems
NASA Technical Reports Server (NTRS)
Chen, Liang-Yu
2003-01-01
A series of ceramic substrates and thick-film metalization-based prototype microsystem packages designed for silicon carbide (SiC) high-temperature microsystems have been developed for operation in 500 C harsh environments. These prototype packages were designed, fabricated, and assembled at the NASA Glenn Research Center. Both the electrical interconnection system and the die-attach scheme for this packaging system have been tested extensively at high temperatures. Printed circuit boards used to interconnect these chip-level packages and passive components also are being fabricated and tested. NASA space and aeronautical missions need harsh-environment, especially high-temperature, operable microsystems for probing the inner solar planets and for in situ monitoring and control of next-generation aeronautical engines. Various SiC high-temperature-operable microelectromechanical system (MEMS) sensors, actuators, and electronics have been demonstrated at temperatures as high as 600 C, but most of these devices were demonstrated only in the laboratory environment partially because systematic packaging technology for supporting these devices at temperatures of 500 C and beyond was not available. Thus, the development of a systematic high-temperature packaging technology is essential for both in situ testing and the commercialization of high-temperature SiC MEMS. Researchers at Glenn developed new prototype packages for high-temperature microsystems using ceramic substrates (aluminum nitride and 96- and 90-wt% aluminum oxides) and gold (Au) thick-film metalization. Packaging components, which include a thick-film metalization-based wirebond interconnection system and a low-electrical-resistance SiC die-attachment scheme, have been tested at temperatures up to 500 C. The interconnection system composed of Au thick-film printed wire and 1-mil Au wire bond was tested in 500 C oxidizing air with and without 50-mA direct current for over 5000 hr. The Au thick-film metalization-based wirebond electrical interconnection system was also tested in an extremely dynamic thermal environment to assess thermal reliability. The I-V curve1 of a SiC high-temperature diode was measured in oxidizing air at 500 C for 1000 hr to electrically test the Au thick-film material-based die-attach assembly.
González-Beltrán, Alejandra; Neumann, Steffen; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe
2014-01-01
The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment. The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data. The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking. The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub https://github.com/ISA-tools/Risa, where the issue tracker allows users to report bugs or feature requests.
The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again
2014-01-01
Background The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment. Results The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data. Conclusions The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking. Software availability The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation and examples. The latest version of the code is at the development branch in Bioconductor and can also be accessed from GitHub https://github.com/ISA-tools/Risa, where the issue tracker allows users to report bugs or feature requests. PMID:24564732
Jiang, Min; Chen, Yukun; Liu, Mei; Rosenbloom, S Trent; Mani, Subramani; Denny, Joshua C; Xu, Hua
2011-01-01
The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge. The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes. Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge. Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.
Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C
2016-12-13
Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .
Breure, Abraham S.H.; Whisson, Corey S.
2012-01-01
Abstract Type material of 41 Australian Bothriembryon taxa present in Australian museums is critically listed, indicating systematic issues that need to be resolved in further studies. Information on additional type material of 22 taxa in non-Australian museums is compiled. The seven fossil taxa known so far are included in this catalogue. Based on the current systematic position, 38 species are treated in this paper. Bothriembryon jacksoni Iredale, 1939, Bothriembryon notatus Iredale, 1939, Bothriembryon praecelsus Iredale, 1939 and Bothriembryon serpentinus Iredale, 1939 are elevated to species level. Bothriembryon gratwicki (Cox, 1899) is listed as status to be determined. PMID:22679384
iScreen: Image-Based High-Content RNAi Screening Analysis Tools.
Zhong, Rui; Dong, Xiaonan; Levine, Beth; Xie, Yang; Xiao, Guanghua
2015-09-01
High-throughput RNA interference (RNAi) screening has opened up a path to investigating functional genomics in a genome-wide pattern. However, such studies are often restricted to assays that have a single readout format. Recently, advanced image technologies have been coupled with high-throughput RNAi screening to develop high-content screening, in which one or more cell image(s), instead of a single readout, were generated from each well. This image-based high-content screening technology has led to genome-wide functional annotation in a wider spectrum of biological research studies, as well as in drug and target discovery, so that complex cellular phenotypes can be measured in a multiparametric format. Despite these advances, data analysis and visualization tools are still largely lacking for these types of experiments. Therefore, we developed iScreen (image-Based High-content RNAi Screening Analysis Tool), an R package for the statistical modeling and visualization of image-based high-content RNAi screening. Two case studies were used to demonstrate the capability and efficiency of the iScreen package. iScreen is available for download on CRAN (http://cran.cnr.berkeley.edu/web/packages/iScreen/index.html). The user manual is also available as a supplementary document. © 2014 Society for Laboratory Automation and Screening.
ggCyto: Next Generation Open-Source Visualization Software for Cytometry.
Van, Phu; Jiang, Wenxin; Gottardo, Raphael; Finak, Greg
2018-06-01
Open source software for computational cytometry has gained in popularity over the past few years. Efforts such as FlowCAP, the Lyoplate and Euroflow projects have highlighted the importance of efforts to standardize both experimental and computational aspects of cytometry data analysis. The R/BioConductor platform hosts the largest collection of open source cytometry software covering all aspects of data analysis and providing infrastructure to represent and analyze cytometry data with all relevant experimental, gating, and cell population annotations enabling fully reproducible data analysis. Data visualization frameworks to support this infrastructure have lagged behind. ggCyto is a new open-source BioConductor software package for cytometry data visualization built on ggplot2 that enables ggplot-like functionality with the core BioConductor flow cytometry data structures. Amongst its features are the ability to transform data and axes on-the-fly using cytometry-specific transformations, plot faceting by experimental meta-data variables, and partial matching of channel, marker and cell populations names to the contents of the BioConductor cytometry data structures. We demonstrate the salient features of the package using publicly available cytometry data with complete reproducible examples in a supplementary material vignette. https://bioconductor.org/packages/devel/bioc/html/ggcyto.html. gfinak@fredhutch.org. Supplementary data are available at Bioinformatics online and at http://rglab.org/ggcyto/.
MalaCards: an integrated compendium for diseases and their annotation
Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron
2013-01-01
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832
Matsuda, Fumio; Shinbo, Yoko; Oikawa, Akira; Hirai, Masami Yokota; Fiehn, Oliver; Kanaya, Shigehiko; Saito, Kazuki
2009-01-01
Background In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. Methodology/Principal Findings The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. Conclusions/Significance High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data. PMID:19847304
Kasturi, Rangachar; Goldgof, Dmitry; Soundararajan, Padmanabhan; Manohar, Vasant; Garofolo, John; Bowers, Rachel; Boonstra, Matthew; Korzhova, Valentina; Zhang, Jing
2009-02-01
Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.
Morrison, Norman; Hancock, David; Hirschman, Lynette; Dawyndt, Peter; Verslyppe, Bert; Kyrpides, Nikos; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver; Grethe, Jeff; Booth, Tim; Sterk, Peter; Nenadic, Goran; Field, Dawn
2011-04-29
In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources.
Morrison, Norman; Hancock, David; Hirschman, Lynette; Dawyndt, Peter; Verslyppe, Bert; Kyrpides, Nikos; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver; Grethe, Jeff; Booth, Tim; Sterk, Peter; Nenadic, Goran; Field, Dawn
2011-01-01
In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources. PMID:21677865
A parent advice package for family shopping trips: development and evaluation1
Clark, Hewitt B.; Greene, Brandon F.; Macrae, John W.; McNees, M. Patrick; Davis, Jerry L.; Risley, Todd R.
1977-01-01
This article reports on the primary steps in the development of parent advice for popular dissemination: (a) developing advice for one specific problem situation, family shopping trips; (b) testing the advice program for benefit to children and convenience to adults; and (c) packaging the advice so it can be used successfully by interested parents. Systematic observation of 12 families using the written advice package on shopping trips revealed its effectiveness in reducing child disruptions and increasing positive interactions between parents and children. These findings, along with interview information from families, showed that the package is usable, effective, and popular with both parents and children, and thus is ready for dissemination to a wide audience of parents—a step that in itself should involve research and evaluation. PMID:16795570
Allegro, Gianni; Giachino, Pier Mauro
2015-08-13
Forty species belonging to the subgenus Agraphoderus of Blennidus have been recorded so far from Peru. An annotated checklist is provided with information about their type locality, distribution and habitat. The nomenclature of each species is also provided, together with some notes on their systematic status. Blennidus bombonensis n. sp. from Cerro de Pasco is described; Blennidus pseudangularis nomen novum for Ogmopleura angularis Straneo, 1993 (nec Straneo, 1985) is proposed and the following synonymies are stated: Blennidus pseudangularis Allegro & Giachino, nomen novum = Blennidus rectangulus (Straneo, 1993) syn. nov.; Ogmopleura minor Straneo, 1993 = Blennidus rectangulus (Straneo, 1993) syn. nov. Type specimens of most species are illustrated, as well as male genitalia. Finally, a revised key to all the Agraphoderus species from Peru is provided.
The software system development for the TAMU real-time fan beam scatterometer data processors
NASA Technical Reports Server (NTRS)
Clark, B. V.; Jean, B. R.
1980-01-01
A software package was designed and written to process in real-time any one quadrature channel pair of radar scatterometer signals form the NASA L- or C-Band radar scatterometer systems. The software was successfully tested in the C-Band processor breadboard hardware using recorded radar and NERDAS (NASA Earth Resources Data Annotation System) signals as the input data sources. The processor development program and the overall processor theory of operation and design are described. The real-time processor software system is documented and the results of the laboratory software tests, and recommendations for the efficient application of the data processing capabilities are presented.
BeadArray Expression Analysis Using Bioconductor
Ritchie, Matthew E.; Dunning, Mark J.; Smith, Mike L.; Shi, Wei; Lynch, Andy G.
2011-01-01
Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered. PMID:22144879
A Software Architecture for Intelligent Synthesis Environments
NASA Technical Reports Server (NTRS)
Filman, Robert E.; Norvig, Peter (Technical Monitor)
2001-01-01
The NASA's Intelligent Synthesis Environment (ISE) program is a grand attempt to develop a system to transform the way complex artifacts are engineered. This paper discusses a "middleware" architecture for enabling the development of ISE. Desirable elements of such an Intelligent Synthesis Architecture (ISA) include remote invocation; plug-and-play applications; scripting of applications; management of design artifacts, tools, and artifact and tool attributes; common system services; system management; and systematic enforcement of policies. This paper argues that the ISA extend conventional distributed object technology (DOT) such as CORBA and Product Data Managers with flexible repositories of product and tool annotations and "plug-and-play" mechanisms for inserting "ility" or orthogonal concerns into the system. I describe the Object Infrastructure Framework, an Aspect Oriented Programming (AOP) environment for developing distributed systems that provides utility insertion and enables consistent annotation maintenance. This technology can be used to enforce policies such as maintaining the annotations of artifacts, particularly the provenance and access control rules of artifacts-, performing automatic datatype transformations between representations; supplying alternative servers of the same service; reporting on the status of jobs and the system; conveying privileges throughout an application; supporting long-lived transactions; maintaining version consistency; and providing software redundancy and mobility.
Do fixed-dose combination pills or unit-of-use packaging improve adherence? A systematic review.
Connor, Jennie; Rafter, Natasha; Rodgers, Anthony
2004-01-01
Adequate adherence to medication regimens is central to the successful treatment of communicable and noncommunicable disease. Fixed-dose combination pills and unit-of-use packaging are therapy-related interventions that are designed to simplify medication regimens and so potentially improve adherence. We conducted a systematic review of relevant randomized trials in order to quantify the effects of fixed-dose combination pills and unit-of-use packaging, compared with medications as usually presented, in terms of adherence to treatment and improved outcomes. Only 15 trials met the inclusion criteria; fixed-dose combination pills were investigated in three of these, while unit-of-use packaging was studied in 12 trials. The trials involved treatments for communicable diseases (n = 5), blood pressure lowering medications (n = 3), diabetic patients (n = 1), vitamin supplementation (n = 1) and management of multiple medications by the elderly (n = 5). The results of the trials suggested that there were trends towards improved adherence and/or clinical outcomes in all but three of the trials; this reached statistical significance in four out of seven trials reporting a clinically relevant or intermediate end-point, and in seven out of thirteen trials reporting medication adherence. Measures of outcome were, however, heterogeneous, and interpretation was further limited by methodological issues, particularly small sample size, short duration and loss to follow-up. Overall, the evidence suggests that fixed-dose combination pills and unit-of-use packaging are likely to improve adherence in a range of settings, but the limitations of the available evidence means that uncertainty remains about the size of these benefits. PMID:15654408
The Paris-Sud yeast structural genomics pilot-project: from structure to function.
Quevillon-Cheruel, Sophie; Liger, Dominique; Leulliot, Nicolas; Graille, Marc; Poupon, Anne; Li de La Sierra-Gallay, Inès; Zhou, Cong-Zhao; Collinet, Bruno; Janin, Joël; Van Tilbeurgh, Herman
2004-01-01
We present here the outlines and results from our yeast structural genomics (YSG) pilot-project. A lab-scale platform for the systematic production and structure determination is presented. In order to validate this approach, 250 non-membrane proteins of unknown structure were targeted. Strategies and final statistics are evaluated. We finally discuss the opportunity of structural genomics programs to contribute to functional biochemical annotation.
A novel spectral library workflow to enhance protein identifications.
Li, Haomin; Zong, Nobel C; Liang, Xiangbo; Kim, Allen K; Choi, Jeong Ho; Deng, Ning; Zelaya, Ivette; Lam, Maggie; Duan, Huilong; Ping, Peipei
2013-04-09
The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2013 Elsevier B.V. All rights reserved.
Lilic, Nick; Stretton, Matthew; Prakash, Minesh
2018-06-05
The aim of this study is to critically appraise the evidence for the effectiveness of the plain packaging of tobacco products policy. A systematic approach to a literature review was undertaken using five databases: PubMed, MEDLINE, Google Scholar, Global Health and Legacy Tobacco Documents Library. Quantitative and qualitative studies that evaluate attitudes towards smoking, starting smoking and quitting intentions when plain packaging use is compared with standard cigarette packaging use were included. A total of 1923 studies were identified. After inclusion and exclusion criteria were applied, nine studies were included in the review. The overall quality of the data was variable but a significant number of the studies had major methodological flaws. However, data analysed in the literature review suggest that exposure to plain packaging increases intention to quit amongst exposed individuals, increases negative attitudes to both smoking and starting smoking. Although the evidence for plain packaging of tobacco is not strong, the evidence that is available indicates that it is an effective tobacco cessation policy. © 2018 Royal Australasian College of Surgeons.
Use of symbolic computation in robotics education
NASA Technical Reports Server (NTRS)
Vira, Naren; Tunstel, Edward
1992-01-01
An application of symbolic computation in robotics education is described. A software package is presented which combines generality, user interaction, and user-friendliness with the systematic usage of symbolic computation and artificial intelligence techniques. The software utilizes MACSYMA, a LISP-based symbolic algebra language, to automatically generate closed-form expressions representing forward and inverse kinematics solutions, the Jacobian transformation matrices, robot pose error-compensation models equations, and Lagrange dynamics formulation for N degree-of-freedom, open chain robotic manipulators. The goal of such a package is to aid faculty and students in the robotics course by removing burdensome tasks of mathematical manipulations. The software package has been successfully tested for its accuracy using commercially available robots.
Teaching preschool children to report suspicious packages to adults.
May, Michael E; Shayter, Ashley M; Schmick, Ayla; Barron, Becky; Doherty, Meghan; Johnson, Matthew
2018-05-16
Law enforcement agencies stress that public reporting of terror-related crime is the predominant means for disrupting these actions. However, schools may be unprepared because the majority of the populace may not understand the threat of suspicious materials or what to do when they are found on school grounds. The purpose of this study was to systematically teach preschool children to identify and report suspicious packages across three experiments. In the first experiment, we used multiple exemplar training to teach children to identify the characteristics of safe and unsafe packages. In the second experiment, we taught participants to identify the locations where packages should be considered unsafe. Finally, in the third experiment, we used behavioral skills training to teach participants to avoid touching unsafe packages, leave the area where they were located, and report their discovery to an adult. Results suggest the participants quickly developed these skills. Implications for safety skills in young school children are discussed. © 2018 Society for the Experimental Analysis of Behavior.
Microarray gene expression profiling analysis combined with bioinformatics in multiple sclerosis.
Liu, Mingyuan; Hou, Xiaojun; Zhang, Ping; Hao, Yong; Yang, Yiting; Wu, Xiongfeng; Zhu, Desheng; Guan, Yangtai
2013-05-01
Multiple sclerosis (MS) is the most prevalent demyelinating disease and the principal cause of neurological disability in young adults. Recent microarray gene expression profiling studies have identified several genetic variants contributing to the complex pathogenesis of MS, however, expressional and functional studies are still required to further understand its molecular mechanism. The present study aimed to analyze the molecular mechanism of MS using microarray analysis combined with bioinformatics techniques. We downloaded the gene expression profile of MS from Gene Expression Omnibus (GEO) and analysed the microarray data using the differentially coexpressed genes (DCGs) and links package in R and Database for Annotation, Visualization and Integrated Discovery. The regulatory impact factor (RIF) algorithm was used to measure the impact factor of transcription factor. A total of 1,297 DCGs between MS patients and healthy controls were identified. Functional annotation indicated that these DCGs were associated with immune and neurological functions. Furthermore, the RIF result suggested that IKZF1, BACH1, CEBPB, EGR1, FOS may play central regulatory roles in controlling gene expression in the pathogenesis of MS. Our findings confirm the presence of multiple molecular alterations in MS and indicate the possibility for identifying prognostic factors associated with MS pathogenesis.
PSAMM: A Portable System for the Analysis of Metabolic Models
Steffensen, Jon Lund; Dufault-Thompson, Keith; Zhang, Ying
2016-01-01
The genome-scale models of metabolic networks have been broadly applied in phenotype prediction, evolutionary reconstruction, community functional analysis, and metabolic engineering. Despite the development of tools that support individual steps along the modeling procedure, it is still difficult to associate mathematical simulation results with the annotation and biological interpretation of metabolic models. In order to solve this problem, here we developed a Portable System for the Analysis of Metabolic Models (PSAMM), a new open-source software package that supports the integration of heterogeneous metadata in model annotations and provides a user-friendly interface for the analysis of metabolic models. PSAMM is independent of paid software environments like MATLAB, and all its dependencies are freely available for academic users. Compared to existing tools, PSAMM significantly reduced the running time of constraint-based analysis and enabled flexible settings of simulation parameters using simple one-line commands. The integration of heterogeneous, model-specific annotation information in PSAMM is achieved with a novel format of YAML-based model representation, which has several advantages, such as providing a modular organization of model components and simulation settings, enabling model version tracking, and permitting the integration of multiple simulation problems. PSAMM also includes a number of quality checking procedures to examine stoichiometric balance and to identify blocked reactions. Applying PSAMM to 57 models collected from current literature, we demonstrated how the software can be used for managing and simulating metabolic models. We identified a number of common inconsistencies in existing models and constructed an updated model repository to document the resolution of these inconsistencies. PMID:26828591
Specialized microbial databases for inductive exploration of microbial genome sequences
Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine
2005-01-01
Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
It’s about This and That: A Description of Anaphoric Expressions in Clinical Text
Wang, Yan; Melton, Genevieve B.; Pakhomov, Serguei
2011-01-01
Although anaphoric expressions are very common in biomedical and clinical documents, little work has been done to systematically characterize their use in clinical text. Samples of ‘it’, ‘this’, and ‘that’ expressions occurring in inpatient clinical notes from four metropolitan hospitals were analyzed using a combination of semi-automated and manual annotation techniques. We developed a rule-based approach to filter potential non-referential expressions. A physician then manually annotated 1000 potential referential instances to determine referent status and the antecedent of each referent expression. A distributional analysis of the three referring expressions in the entire corpus of notes demonstrates a high prevalence of anaphora and large variance in distributions of referential expressions with different notes. Our results confirm that anaphoric expressions are common in clinical texts. Effective co-reference resolution with anaphoric expressions remains an important challenge in medical natural language processing research. PMID:22195211
Basic level scene understanding: categories, attributes and structures
Xiao, Jianxiong; Hays, James; Russell, Bryan C.; Patterson, Genevieve; Ehinger, Krista A.; Torralba, Antonio; Oliva, Aude
2013-01-01
A longstanding goal of computer vision is to build a system that can automatically understand a 3D scene from a single image. This requires extracting semantic concepts and 3D information from 2D images which can depict an enormous variety of environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the richly annotated SUN database which is a collection of annotated images spanning 908 different scene categories with object, attribute, and geometric labels for many scenes. This database allows us to systematically study the space of scenes and to establish a benchmark for scene and object recognition. We augment the categorical SUN database with 102 scene attributes for every image and explore attribute recognition. Finally, we present an integrated system to extract the 3D structure of the scene and objects depicted in an image. PMID:24009590
AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis
Boyle, Thomas J; Bao, Zhirong; Murray, John I; Araya, Carlos L; Waterston, Robert H
2006-01-01
Background The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point. Results We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data. Conclusion By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development. PMID:16740163
AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis.
Boyle, Thomas J; Bao, Zhirong; Murray, John I; Araya, Carlos L; Waterston, Robert H
2006-06-01
The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point. We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data. By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
mESAdb: microRNA expression and sequence analysis database.
Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
Fenwick, Matthew; Hoch, Jeffrey C.; Ulrich, Eldon; Gryk, Michael R.
2015-01-01
Reproducibility is a cornerstone of the scientific method, essential for validation of results by independent laboratories and the sine qua non of scientific progress. A key step toward reproducibility of biomolecular NMR studies was the establishment of public data repositories (PDB and BMRB). Nevertheless, bio-NMR studies routinely fall short of the requirement for reproducibility that all the data needed to reproduce the results are published. A key limitation is that considerable metadata goes unpublished, notably manual interventions that are typically applied during the assignment of multidimensional NMR spectra. A general solution to this problem has been elusive, in part because of the wide range of approaches and software packages employed in the analysis of protein NMR spectra. Here we describe an approach for capturing missing metadata during the assignment of protein NMR spectra that can be generalized to arbitrary workflows, different software packages, other biomolecules, or other stages of data analysis in bio-NMR. We also present extensions to the NMR-STAR data dictionary that enable machine archival and retrieval of the “missing” metadata. PMID:26253947
How to benchmark methods for structure-based virtual screening of large compound libraries.
Christofferson, Andrew J; Huang, Niu
2012-01-01
Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.
Integration of biological networks and gene expression data using Cytoscape
Cline, Melissa S; Smoot, Michael; Cerami, Ethan; Kuchinsky, Allan; Landys, Nerius; Workman, Chris; Christmas, Rowan; Avila-Campilo, Iliana; Creech, Michael; Gross, Benjamin; Hanspers, Kristina; Isserlin, Ruth; Kelley, Ryan; Killcoyne, Sarah; Lotia, Samad; Maere, Steven; Morris, John; Ono, Keiichiro; Pavlovic, Vuk; Pico, Alexander R; Vailaya, Aditya; Wang, Peng-Liang; Adler, Annette; Conklin, Bruce R; Hood, Leroy; Kuiper, Martin; Sander, Chris; Schmulevich, Ilya; Schwikowski, Benno; Warner, Guy J; Ideker, Trey; Bader, Gary D
2013-01-01
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape. PMID:17947979
ontologyX: a suite of R packages for working with ontological data.
Greene, Daniel; Richardson, Sylvia; Turro, Ernest
2017-04-01
Ontologies are widely used constructs for encoding and analyzing biomedical data, but the absence of simple and consistent tools has made exploratory and systematic analysis of such data unnecessarily difficult. Here we present three packages which aim to simplify such procedures. The ontologyIndex package enables arbitrary ontologies to be read into R, supports representation of ontological objects by native R types, and provides a parsimonius set of performant functions for querying ontologies. ontologySimilarity and ontologyPlot extend ontologyIndex with functionality for straightforward visualization and semantic similarity calculations, including statistical routines. ontologyIndex , ontologyPlot and ontologySimilarity are all available on the Comprehensive R Archive Network website under https://cran.r-project.org/web/packages/ . Daniel Greene dg333@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H
2014-11-19
Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Childs, Sue; Blenkinsopp, Elizabeth; Hall, Amanda; Walton, Graham
2005-12-01
In 2003/4 the Information Management Research Institute, Northumbria University, conducted a research project to identify the barriers to e-learning for health professionals and students. The project also established possible ways to overcome these barriers. The North of England Workforce Development Confederation funded the project. The project comprised a systematic review of the literature on barriers to and solutions/critical success factors for e-learning in the health field. Fifty-seven references were suitable for analysis. This review was supplemented by a questionnaire survey of learners and an interview study of learning providers to ensure that data identified from the literature were grounded in reality. The main barriers are: requirement for change; costs; poorly designed packages; inadequate technology; lack of skills; need for a component of face-to-face teaching; time intensive nature of e-learning; computer anxiety. A range of solutions can solve these barriers. The main solutions are: standardization; strategies; funding; integration of e-learning into the curriculum; blended teaching; user friendly packages; access to technology; skills training; support; employers paying e-learning costs; dedicated work time for e-learning. The authors argue that librarians can play an important role in e-learning: providing support and support materials; teaching information skills; managing and providing access to online information resources; producing their own e-learning packages; assisting in the development of other packages.
pyAmpli: an amplicon-based variant filter pipeline for targeted resequencing data.
Beyens, Matthias; Boeckx, Nele; Van Camp, Guy; Op de Beeck, Ken; Vandeweyer, Geert
2017-12-14
Haloplex targeted resequencing is a popular method to analyze both germline and somatic variants in gene panels. However, involved wet-lab procedures may introduce false positives that need to be considered in subsequent data-analysis. No variant filtering rationale addressing amplicon enrichment related systematic errors, in the form of an all-in-one package, exists to our knowledge. We present pyAmpli, a platform independent parallelized Python package that implements an amplicon-based germline and somatic variant filtering strategy for Haloplex data. pyAmpli can filter variants for systematic errors by user pre-defined criteria. We show that pyAmpli significantly increases specificity, without reducing sensitivity, essential for reporting true positive clinical relevant mutations in gene panel data. pyAmpli is an easy-to-use software tool which increases the true positive variant call rate in targeted resequencing data. It specifically reduces errors related to PCR-based enrichment of targeted regions.
Optimized Shielding and Fabrication Techniques for TiN and Al Microwave Resonators
NASA Astrophysics Data System (ADS)
Kreikebaum, John Mark; Kim, Eunseong; Livingston, William; Dove, Allison; Calusine, Gregory; Hover, David; Rosenberg, Danna; Oliver, William; Siddiqi, Irfan
We present a systematic study of the effects of shielding and packaging on the internal quality factor (Qi) of Al and TiN microwave resonators designed for use in qubit readout. Surprisingly, Qi =1.3x106 TiN samples investigated at 100 mK exhibited no significant changes in linewidth when operated without magnetic shielding and in an open cryo-package. In contrast, Al resonators showed systematic improvement in Qi with each successive shield. Measurements were performed in an adiabatic demagnetization refrigerator, where typical ambient fields of 0.2 mT are present at the sample stage. We discuss the effect of 100 mK and 500 mK Cu radiation shields and cryoperm magnetic shielding on resonator Q as a function of temperature and input power in samples prepared with a variety of surface treatments, fabrication recipes, and embedding circuits. This research was supported by the ARO and IARPA.
2015-01-01
Systematic analysis and interpretation of the large number of tandem mass spectra (MS/MS) obtained in metabolomics experiments is a bottleneck in discovery-driven research. MS/MS mass spectral libraries are small compared to all known small molecule structures and are often not freely available. MS2Analyzer was therefore developed to enable user-defined searches of thousands of spectra for mass spectral features such as neutral losses, m/z differences, and product and precursor ions from MS/MS spectra in MSP/MGF files. The software is freely available at http://fiehnlab.ucdavis.edu/projects/MS2Analyzer/. As the reference query set, 147 literature-reported neutral losses and their corresponding substructures were collected. This set was tested for accuracy of linking neutral loss analysis to substructure annotations using 19 329 accurate mass tandem mass spectra of structurally known compounds from the NIST11 MS/MS library. Validation studies showed that 92.1 ± 6.4% of 13 typical neutral losses such as acetylations, cysteine conjugates, or glycosylations are correct annotating the associated substructures, while the absence of mass spectra features does not necessarily imply the absence of such substructures. Use of this tool has been successfully demonstrated for complex lipids in microalgae. PMID:25263576
Sorensen, Asta V; Bernard, Shulamit L
2012-02-01
Learning (quality improvement) collaboratives are effective vehicles for driving coordinated organizational improvements. A central element of a learning collaborative is the change package-a catalogue of strategies, change concepts, and action steps that guide participants in their improvement efforts. Despite a vast literature describing learning collaboratives, little to no information is available on how the guiding strategies, change concepts, and action items are identified and developed to a replicable and actionable format that can be used to make measurable improvements within participating organizations. The process for developing the change package for the Health Resources and Services Administration's (HRSA) Patient Safety and Clinical Pharmacy Services Collaborative entailed environmental scan and identification of leading practices, case studies, interim debriefing meetings, data synthesis, and a technical expert panel meeting. Data synthesis involved end-of-day debriefings, systematic qualitative analyses, and the use of grounded theory and inductive data analysis techniques. This approach allowed systematic identification of innovative patient safety and clinical pharmacy practices that could be adopted in diverse environments. A case study approach enabled the research team to study practices in their natural environments. Use of grounded theory and inductive data analysis techniques enabled identification of strategies, change concepts, and actionable items that might not have been captured using different approaches. Use of systematic processes and qualitative methods in identification and translation of innovative practices can greatly accelerate the diffusion of innovations and practice improvements. This approach is effective whether or not an individual organization is part of a learning collaborative.
Ward, Lucas D; Kellis, Manolis
2016-01-04
More than 90% of common variants associated with complex traits do not affect proteins directly, but instead the circuits that control gene expression. This has increased the urgency of understanding the regulatory genome as a key component for translating genetic results into mechanistic insights and ultimately therapeutics. To address this challenge, we developed HaploReg (http://compbio.mit.edu/HaploReg) to aid the functional dissection of genome-wide association study (GWAS) results, the prediction of putative causal variants in haplotype blocks, the prediction of likely cell types of action, and the prediction of candidate target genes by systematic mining of comparative, epigenomic and regulatory annotations. Since first launching the website in 2011, we have greatly expanded HaploReg, increasing the number of chromatin state maps to 127 reference epigenomes from ENCODE 2012 and Roadmap Epigenomics, incorporating regulator binding data, expanding regulatory motif disruption annotations, and integrating expression quantitative trait locus (eQTL) variants and their tissue-specific target genes from GTEx, Geuvadis, and other recent studies. We present these updates as HaploReg v4, and illustrate a use case of HaploReg for attention deficit hyperactivity disorder (ADHD)-associated SNPs with putative brain regulatory mechanisms. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Family-specific scaling laws in bacterial genomes.
De Lazzari, Eleonora; Grilli, Jacopo; Maslov, Sergei; Cosentino Lagomarsino, Marco
2017-07-27
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
de Hoop, Bartjan; Gietema, Hester; van Ginneken, Bram; Zanen, Pieter; Groenewegen, Gerard; Prokop, Mathias
2009-04-01
We compared interexamination variability of CT lung nodule volumetry with six currently available semi-automated software packages to determine the minimum change needed to detect the growth of solid lung nodules. We had ethics committee approval. To simulate a follow-up examination with zero growth, we performed two low-dose unenhanced CT scans in 20 patients referred for pulmonary metastases. Between examinations, patients got off and on the table. Volumes of all pulmonary nodules were determined on both examinations using six nodule evaluation software packages. Variability (upper limit of the 95% confidence interval of the Bland-Altman plot) was calculated for nodules for which segmentation was visually rated as adequate. We evaluated 214 nodules (mean diameter 10.9 mm, range 3.3 mm-30.0 mm). Software packages provided adequate segmentation in 71% to 86% of nodules (p < 0.001). In case of adequate segmentation, variability in volumetry between scans ranged from 16.4% to 22.3% for the various software packages. Variability with five to six software packages was significantly less for nodules >or=8 mm in diameter (range 12.9%-17.1%) than for nodules <8 mm (range 18.5%-25.6%). Segmented volumes of each package were compared to each of the other packages. Systematic volume differences were detected in 11/15 comparisons. This hampers comparison of nodule volumes between software packages.
Identification of functional elements and regulatory circuits by Drosophila modENCODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.
2010-12-22
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Zhou, Shiyong; Liu, Pengfei; Zhang, Huilai
2017-01-01
Acute myeloid leukemia (AML) is a frequently occurring malignant disease of the blood and may result from a variety of genetic disorders. The present study aimed to identify the underlying mechanisms associated with the therapeutic effects of decitabine and cytarabine on AML, using microarray analysis. The microarray datasets GSE40442 and GSE40870 were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) and differentially methylated sites were identified in AML cells treated with decitabine compared with those treated with cytarabine via the Linear Models for Microarray Data package, following data pre-processing. Gene Ontology (GO) analysis of DEGs was performed using the Database for Annotation, Visualization and Integrated Analysis Discovery. Genes corresponding to the differentially methylated sites were obtained using the annotation package of the methylation microarray platform. The overlapping genes were identified, which exhibited the opposite variation trend between gene expression and DNA methylation. Important transcription factor (TF)-gene pairs were screened out, and a regulated network subsequently constructed. A total of 190 DEGs and 540 differentially methylated sites were identified in AML cells treated with decitabine compared with those treated with cytarabine. A total of 36 GO terms of DEGs were enriched, including nucleosomes, protein-DNA complexes and the nucleosome assembly. The 540 differentially methylated sites were located on 240 genes, including the acid-repeat containing protein (ACRC) gene that was additionally differentially expressed. In addition, 60 TF pairs and overlapped methylated sites, and 140 TF-pairs and DEGs were screened out. The regulated network included 68 nodes and 140 TF-gene pairs. The present study identified various genes including ACRC and proliferating cell nuclear antigen, in addition to various TFs, including TATA-box binding protein associated factor 1 and CCCTC-binding factor, which may be potential therapeutic targets of AML. PMID:28498449
Zhou, Shiyong; Liu, Pengfei; Zhang, Huilai
2017-07-01
Acute myeloid leukemia (AML) is a frequently occurring malignant disease of the blood and may result from a variety of genetic disorders. The present study aimed to identify the underlying mechanisms associated with the therapeutic effects of decitabine and cytarabine on AML, using microarray analysis. The microarray datasets GSE40442 and GSE40870 were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) and differentially methylated sites were identified in AML cells treated with decitabine compared with those treated with cytarabine via the Linear Models for Microarray Data package, following data pre‑processing. Gene Ontology (GO) analysis of DEGs was performed using the Database for Annotation, Visualization and Integrated Analysis Discovery. Genes corresponding to the differentially methylated sites were obtained using the annotation package of the methylation microarray platform. The overlapping genes were identified, which exhibited the opposite variation trend between gene expression and DNA methylation. Important transcription factor (TF)‑gene pairs were screened out, and a regulated network subsequently constructed. A total of 190 DEGs and 540 differentially methylated sites were identified in AML cells treated with decitabine compared with those treated with cytarabine. A total of 36 GO terms of DEGs were enriched, including nucleosomes, protein‑DNA complexes and the nucleosome assembly. The 540 differentially methylated sites were located on 240 genes, including the acid‑repeat containing protein (ACRC) gene that was additionally differentially expressed. In addition, 60 TF pairs and overlapped methylated sites, and 140 TF‑pairs and DEGs were screened out. The regulated network included 68 nodes and 140 TF‑gene pairs. The present study identified various genes including ACRC and proliferating cell nuclear antigen, in addition to various TFs, including TATA‑box binding protein associated factor 1 and CCCTC‑binding factor, which may be potential therapeutic targets of AML.
Equalizer reduces SNP bias in Affymetrix microarrays.
Quigley, David
2015-07-30
Gene expression microarrays measure the levels of messenger ribonucleic acid (mRNA) in a sample using probe sequences that hybridize with transcribed regions. These probe sequences are designed using a reference genome for the relevant species. However, most model organisms and all humans have genomes that deviate from their reference. These variations, which include single nucleotide polymorphisms, insertions of additional nucleotides, and nucleotide deletions, can affect the microarray's performance. Genetic experiments comparing individuals bearing different population-associated single nucleotide polymorphisms that intersect microarray probes are therefore subject to systemic bias, as the reduction in binding efficiency due to a technical artifact is confounded with genetic differences between parental strains. This problem has been recognized for some time, and earlier methods of compensation have attempted to identify probes affected by genome variants using statistical models. These methods may require replicate microarray measurement of gene expression in the relevant tissue in inbred parental samples, which are not always available in model organisms and are never available in humans. By using sequence information for the genomes of organisms under investigation, potentially problematic probes can now be identified a priori. However, there is no published software tool that makes it easy to eliminate these probes from an annotation. I present equalizer, a software package that uses genome variant data to modify annotation files for the commonly used Affymetrix IVT and Gene/Exon platforms. These files can be used by any microarray normalization method for subsequent analysis. I demonstrate how use of equalizer on experiments mapping germline influence on gene expression in a genetic cross between two divergent mouse species and in human samples significantly reduces probe hybridization-induced bias, reducing false positive and false negative findings. The equalizer package reduces probe hybridization bias from experiments performed on the Affymetrix microarray platform, allowing accurate assessment of germline influence on gene expression.
Divi, Uday K; Zhou, Xue-Rong; Wang, Penghao; Butlin, Jamie; Zhang, Dong-Mei; Liu, Qing; Vanhercke, Thomas; Petrie, James R; Talbot, Mark; White, Rosemary G; Taylor, Jennifer M; Larkin, Philip; Singh, Surinder P
2016-01-01
Chinese tallow (Triadica sebifera) is a valuable oilseed-producing tree that can grow in a variety of conditions without competing for food production, and is a promising biofuel feedstock candidate. The fruits are unique in that they contain both saturated and unsaturated fat present in the tallow and seed layer, respectively. The tallow layer is poorly studied and is considered only as an external fatty deposition secreted from the seed. In this study we show that tallow is in fact a non-seed cellular tissue capable of triglyceride synthesis. Knowledge of lipid synthesis and storage mechanisms in tissues other than seed is limited but essential to generate oil-rich biomass crops. Here, we describe the annotated transcriptome assembly generated from the fruit coat, tallow and seed tissues of Chinese tallow. The final assembly was functionally annotated, allowing for the identification of candidate genes and reconstruction of lipid pathways. A tallow tissue-specific paralog for the transcription factor gene WRINKLED1 (WRI1) and lipid droplet-associated protein genes, distinct from those expressed in seed tissue, were found to be active in tallow, underpinning the mode of oil synthesis and packaging in this tissue. Our data have established an excellent knowledge base that can provide genetic and biochemical insights for engineering non-seed tissues to accumulate large amounts of oil. In addition to the large data set of annotated transcripts, the study also provides gene-based simple sequence repeat and single nucleotide polymorphism markers. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Sports Stars: Analyzing the Performance of Astronomers at Visualization-based Discovery
NASA Astrophysics Data System (ADS)
Fluke, C. J.; Parrington, L.; Hegarty, S.; MacMahon, C.; Morgan, S.; Hassan, A. H.; Kilborn, V. A.
2017-05-01
In this data-rich era of astronomy, there is a growing reliance on automated techniques to discover new knowledge. The role of the astronomer may change from being a discoverer to being a confirmer. But what do astronomers actually look at when they distinguish between “sources” and “noise?” What are the differences between novice and expert astronomers when it comes to visual-based discovery? Can we identify elite talent or coach astronomers to maximize their potential for discovery? By looking to the field of sports performance analysis, we consider an established, domain-wide approach, where the expertise of the viewer (i.e., a member of the coaching team) plays a crucial role in identifying and determining the subtle features of gameplay that provide a winning advantage. As an initial case study, we investigate whether the SportsCode performance analysis software can be used to understand and document how an experienced Hi astronomer makes discoveries in spectral data cubes. We find that the process of timeline-based coding can be applied to spectral cube data by mapping spectral channels to frames within a movie. SportsCode provides a range of easy to use methods for annotation, including feature-based codes and labels, text annotations associated with codes, and image-based drawing. The outputs, including instance movies that are uniquely associated with coded events, provide the basis for a training program or team-based analysis that could be used in unison with discipline specific analysis software. In this coordinated approach to visualization and analysis, SportsCode can act as a visual notebook, recording the insight and decisions in partnership with established analysis methods. Alternatively, in situ annotation and coding of features would be a valuable addition to existing and future visualization and analysis packages.
Bond, R R; Zhu, T; Finlay, D D; Drew, B; Kligfield, P D; Guldenring, D; Breen, C; Gallagher, A G; Daly, M J; Clifford, G D
2014-01-01
It is well known that accurate interpretation of the 12-lead electrocardiogram (ECG) requires a high degree of skill. There is also a moderate degree of variability among those who interpret the ECG. While this is the case, there are no best practice guidelines for the actual ECG interpretation process. Hence, this study adopts computerized eye tracking technology to investigate whether eye-gaze can be used to gain a deeper insight into how expert annotators interpret the ECG. Annotators were recruited in San Jose, California at the 2013 International Society of Computerised Electrocardiology (ISCE). Each annotator was recruited to interpret a number of 12-lead ECGs (N=12) while their eye gaze was recorded using a Tobii X60 eye tracker. The device is based on corneal reflection and is non-intrusive. With a sampling rate of 60Hz, eye gaze coordinates were acquired every 16.7ms. Fixations were determined using a predefined computerized classification algorithm, which was then used to generate heat maps of where the annotators looked. The ECGs used in this study form four groups (3=ST elevation myocardial infarction [STEMI], 3=hypertrophy, 3=arrhythmias and 3=exhibiting unique artefacts). There was also an equal distribution of difficulty levels (3=easy to interpret, 3=average and 3=difficult). ECGs were displayed using the 4x3+1 display format and computerized annotations were concealed. Precisely 252 expert ECG interpretations (21 annotators×12 ECGs) were recorded. Average duration for ECG interpretation was 58s (SD=23). Fleiss' generalized kappa coefficient (Pa=0.56) indicated a moderate inter-rater reliability among the annotators. There was a 79% inter-rater agreement for STEMI cases, 71% agreement for arrhythmia cases, 65% for the lead misplacement and dextrocardia cases and only 37% agreement for the hypertrophy cases. In analyzing the total fixation duration, it was found that on average annotators study lead V1 the most (4.29s), followed by leads V2 (3.83s), the rhythm strip (3.47s), II (2.74s), V3 (2.63s), I (2.53s), aVL (2.45s), V5 (2.27s), aVF (1.74s), aVR (1.63s), V6 (1.39s), III (1.32s) and V4 (1.19s). It was also found that on average the annotator spends an equal amount of time studying leads in the frontal plane (15.89s) when compared to leads in the transverse plane (15.70s). It was found that on average the annotators fixated on lead I first followed by leads V2, aVL, V1, II, aVR, V3, rhythm strip, III, aVF, V5, V4 and V6. We found a strong correlation (r=0.67) between time to first fixation on a lead and the total fixation duration on each lead. This indicates that leads studied first are studied the longest. There was a weak negative correlation between duration and accuracy (r=-0.2) and a strong correlation between age and accuracy (r=0.67). Eye tracking facilitated a deeper insight into how expert annotators interpret the 12-lead ECG. As a result, the authors recommend ECG annotators to adopt an initial first impression/pattern recognition approach followed by a conventional systematic protocol to ECG interpretation. This recommendation is based on observing misdiagnoses given due to first impression only. In summary, this research presents eye gaze results from expert ECG annotators and provides scope for future work that involves exploiting computerized eye tracking technology to further the science of ECG interpretation. Copyright © 2014 Elsevier Inc. All rights reserved.
HALT to qualify electronic packages: a proof of concept
NASA Astrophysics Data System (ADS)
Ramesham, Rajeshuni
2014-03-01
A proof of concept of the Highly Accelerated Life Testing (HALT) technique was explored to assess and optimize electronic packaging designs for long duration deep space missions in a wide temperature range (-150°C to +125°C). HALT is a custom hybrid package suite of testing techniques using environments such as extreme temperatures and dynamic shock step processing from 0g up to 50g of acceleration. HALT testing used in this study implemented repetitive shock on the test vehicle components at various temperatures to precipitate workmanship and/or manufacturing defects to show the weak links of the designs. The purpose is to reduce the product development cycle time for improvements to the packaging design qualification. A test article was built using advanced electronic package designs and surface mount technology processes, which are considered useful for a variety of JPL and NASA projects, i.e. (surface mount packages such as ball grid arrays (BGA), plastic ball grid arrays (PBGA), very thin chip array ball grid array (CVBGA), quad flat-pack (QFP), micro-lead-frame (MLF) packages, several passive components, etc.). These packages were daisy-chained and independently monitored during the HALT test. The HALT technique was then implemented to predict reliability and assess survivability of these advanced packaging techniques for long duration deep space missions in much shorter test durations. Test articles were built using advanced electronic package designs that are considered useful in various NASA projects. All the advanced electronic packages were daisychained independently to monitor the continuity of the individual electronic packages. Continuity of the daisy chain packages was monitored during the HALT testing using a data logging system. We were able to test the boards up to 40g to 50g shock levels at temperatures ranging from +125°C to -150°C. The HALT system can deliver 50g shock levels at room temperature. Several tests were performed by subjecting the test boards to various g levels ranging from 5g to 50g, test durations of 10 minutes to 60 minutes, hot temperatures of up to +125°C and cold temperatures down to -150°C. During the HALT test, electrical continuity measurements of the PBGA package showed an open-circuit, whereas the BGA, MLF, and QFPs showed signs of small variations of electrical continuity measurements. The electrical continuity anomaly of the PBGA occurred in the test board within 12 hours of commencing the accelerated test. Similar test boards were assembled, thermal cycled independently from -150°C to +125°C and monitored for electrical continuity through each package design. The PBGA package on the test board showed an anomalous electrical continuity behavior after 959 thermal cycles. Each thermal cycle took around 2.33 hours, so that a total test time to failure of the PBGA was 2,237 hours (or ~3.1 months) due to thermal cycling alone. The accelerated technique (thermal cycling + shock) required only 12 hours to cause a failure in the PBGA electronic package. Compared to the thermal cycle only test, this was an acceleration of ~186 times (more than 2 orders of magnitude). This acceleration process can save significant time and resources for predicting the life of a package component in a given environment, assuming the failure mechanisms are similar in both the tests. Further studies are in progress to make systematic evaluations of the HALT technique on various other advanced electronic packaging components on the test board. With this information one will be able to estimate the number of mission thermal cycles to failure with a much shorter test program. Further studies are in progress to make systematic study of various components, constant temperature range for both the tests. Therefore, one can estimate the number of hours to fail in a given thermal and shock levels for a given test board physical properties.
Dunet, Vincent; Klein, Ran; Allenbach, Gilles; Renaud, Jennifer; deKemp, Robert A; Prior, John O
2016-06-01
Several analysis software packages for myocardial blood flow (MBF) quantification from cardiac PET studies exist, but they have not been compared using concordance analysis, which can characterize precision and bias separately. Reproducible measurements are needed for quantification to fully develop its clinical potential. Fifty-one patients underwent dynamic Rb-82 PET at rest and during adenosine stress. Data were processed with PMOD and FlowQuant (Lortie model). MBF and myocardial flow reserve (MFR) polar maps were quantified and analyzed using a 17-segment model. Comparisons used Pearson's correlation ρ (measuring precision), Bland and Altman limit-of-agreement and Lin's concordance correlation ρc = ρ·C b (C b measuring systematic bias). Lin's concordance and Pearson's correlation values were very similar, suggesting no systematic bias between software packages with an excellent precision ρ for MBF (ρ = 0.97, ρc = 0.96, C b = 0.99) and good precision for MFR (ρ = 0.83, ρc = 0.76, C b = 0.92). On a per-segment basis, no mean bias was observed on Bland-Altman plots, although PMOD provided slightly higher values than FlowQuant at higher MBF and MFR values (P < .0001). Concordance between software packages was excellent for MBF and MFR, despite higher values by PMOD at higher MBF values. Both software packages can be used interchangeably for quantification in daily practice of Rb-82 cardiac PET.
Mudgal, Richa; Srinivasan, Narayanaswamy; Chandra, Nagasuma
2017-07-01
Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319-1335. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk
2014-10-09
To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less
The Pathway Coexpression Network: Revealing pathway relationships
Tanzi, Rudolph E.
2018-01-01
A goal of genomics is to understand the relationships between biological processes. Pathways contribute to functional interplay within biological processes through complex but poorly understood interactions. However, limited functional references for global pathway relationships exist. Pathways from databases such as KEGG and Reactome provide discrete annotations of biological processes. Their relationships are currently either inferred from gene set enrichment within specific experiments, or by simple overlap, linking pathway annotations that have genes in common. Here, we provide a unifying interpretation of functional interaction between pathways by systematically quantifying coexpression between 1,330 canonical pathways from the Molecular Signatures Database (MSigDB) to establish the Pathway Coexpression Network (PCxN). We estimated the correlation between canonical pathways valid in a broad context using a curated collection of 3,207 microarrays from 72 normal human tissues. PCxN accounts for shared genes between annotations to estimate significant correlations between pathways with related functions rather than with similar annotations. We demonstrate that PCxN provides novel insight into mechanisms of complex diseases using an Alzheimer’s Disease (AD) case study. PCxN retrieved pathways significantly correlated with an expert curated AD gene list. These pathways have known associations with AD and were significantly enriched for genes independently associated with AD. As a further step, we show how PCxN complements the results of gene set enrichment methods by revealing relationships between enriched pathways, and by identifying additional highly correlated pathways. PCxN revealed that correlated pathways from an AD expression profiling study include functional clusters involved in cell adhesion and oxidative stress. PCxN provides expanded connections to pathways from the extracellular matrix. PCxN provides a powerful new framework for interrogation of global pathway relationships. Comprehensive exploration of PCxN can be performed at http://pcxn.org/. PMID:29554099
Bourras, Salim; Meyer, Michel; Grandaubert, Jonathan; Lapalu, Nicolas; Fudal, Isabelle; Linglin, Juliette; Ollivier, Benedicte; Blaise, Françoise; Balesdent, Marie-Hélène; Rouxel, Thierry
2012-08-01
The ever-increasing generation of sequence data is accompanied by unsatisfactory functional annotation, and complex genomes, such as those of plants and filamentous fungi, show a large number of genes with no predicted or known function. For functional annotation of unknown or hypothetical genes, the production of collections of mutants using Agrobacterium tumefaciens-mediated transformation (ATMT) associated with genotyping and phenotyping has gained wide acceptance. ATMT is also widely used to identify pathogenicity determinants in pathogenic fungi. A systematic analysis of T-DNA borders was performed in an ATMT-mutagenized collection of the phytopathogenic fungus Leptosphaeria maculans to evaluate the features of T-DNA integration in its particular transposable element-rich compartmentalized genome. A total of 318 T-DNA tags were recovered and analyzed for biases in chromosome and genic compartments, existence of CG/AT skews at the insertion site, and occurrence of microhomologies between the T-DNA left border (LB) and the target sequence. Functional annotation of targeted genes was done using the Gene Ontology annotation. The T-DNA integration mainly targeted gene-rich, transcriptionally active regions, and it favored biological processes consistent with the physiological status of a germinating spore. T-DNA integration was strongly biased toward regulatory regions, and mainly promoters. Consistent with the T-DNA intranuclear-targeting model, the density of T-DNA insertion correlated with CG skew near the transcription initiation site. The existence of microhomologies between promoter sequences and the T-DNA LB flanking sequence was also consistent with T-DNA integration to host DNA mediated by homologous recombination based on the microhomology-mediated end-joining pathway.
Dai, Yilin; Guo, Ling; Li, Meng; Chen, Yi-Bu
2012-06-08
Microarray data analysis presents a significant challenge to researchers who are unable to use the powerful Bioconductor and its numerous tools due to their lack of knowledge of R language. Among the few existing software programs that offer a graphic user interface to Bioconductor packages, none have implemented a comprehensive strategy to address the accuracy and reliability issue of microarray data analysis due to the well known probe design problems associated with many widely used microarray chips. There is also a lack of tools that would expedite the functional analysis of microarray results. We present Microarray Я US, an R-based graphical user interface that implements over a dozen popular Bioconductor packages to offer researchers a streamlined workflow for routine differential microarray expression data analysis without the need to learn R language. In order to enable a more accurate analysis and interpretation of microarray data, we incorporated the latest custom probe re-definition and re-annotation for Affymetrix and Illumina chips. A versatile microarray results output utility tool was also implemented for easy and fast generation of input files for over 20 of the most widely used functional analysis software programs. Coupled with a well-designed user interface, Microarray Я US leverages cutting edge Bioconductor packages for researchers with no knowledge in R language. It also enables a more reliable and accurate microarray data analysis and expedites downstream functional analysis of microarray results.
Gautier, Laurent
2010-12-21
Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of translators required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: http://pypi.python.org/pypi/rpy2-bioconductor-extensions/.
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
The Origin of Salt-Encased Sediment Packages: Observations from the SE Precaspian Basin (Kazakhstan)
NASA Astrophysics Data System (ADS)
Fernandez, Naiara; Duffy, Oliver B.; Hudec, Michael R.; Jackson, Martin P. A.; Burg, George; Jackson, Christopher A.-L.; Dooley, Tim P.
2017-04-01
Intrasalt sediment packages containing siliciclastic sediments, carbonate sediments, or non-halite evaporites such as gypsum or anhydrite are common within most salt sequences. Intrasalt sediment packages may have been deposited before, during, or after salt deposition and be incorporated into the salt by various processes. Understanding the origin and evolution of intrasalt sediment packages may yield important insights into the tectonic and geodynamic history of the basin, and also into the understanding of salt tectonics. Despite the importance of intrasalt sediment packages, currently there is no systematic description of their possible origins and their distinguishing criteria. This work is divided in three parts. First, we outline the possible origins of intrasalt sediment packages, as well as criteria to determine if they originated as subsalt, suprasalt or intrasalt sequences. Second, we examine how sediment packages that originated on top of salt, such as minibasins, can be encased within salt. We propose four key processes by which salt can be expelled and emplaced above minibasins to encase them: a) salt expulsion from beneath a minibasin experiencing density-driven subsidence; b) salt expulsion from beneath adjacent subsiding minibasins; c) salt expulsion associated with lateral shortening; d) override of minibasins by a salt sheet sourced from elsewhere. Third, we present a case study from the SE Precaspian Basin, Kazakhstan, where, using a borehole-constrained 3D seismic reflection dataset, the proposed criteria are applied to an area with abundant, newly discovered sediment packages within salt.
Root Cause Investigation of Lead-Free Solder Joint Interfacial Failures After Multiple Reflows
NASA Astrophysics Data System (ADS)
Li, Yan; Hatch, Olen; Liu, Pilin; Goyal, Deepak
2017-03-01
Solder joint interconnects in three-dimensional (3D) packages with package stacking configurations typically must undergo multiple reflow cycles during the assembly process. In this work, interfacial open joint failures between the bulk solder and the intermetallic compound (IMC) layer were found in Sn-Ag-Cu (SAC) solder joints connecting a small package to a large package after multiple reflow reliability tests. Systematic progressive 3D x-ray computed tomography experiments were performed on both incoming and assembled parts to reveal the initiation and evolution of the open failures in the same solder joints before and after the reliability tests. Characterization studies, including focused ion beam cross-sections, scanning electron microscopy, and energy-dispersive x-ray spectroscopy, were conducted to determine the correlation between IMC phase transformation and failure initiation in the solder joints. A comprehensive failure mechanism, along with solution paths for the solder joint interfacial failures after multiple reflow cycles, is discussed in detail.
The RNA-binding protein repertoire of embryonic stem cells.
Kwon, S Chul; Yi, Hyerim; Eichelbaum, Katrin; Föhr, Sophia; Fischer, Bernd; You, Kwon Tae; Castello, Alfredo; Krijgsveld, Jeroen; Hentze, Matthias W; Kim, V Narry
2013-09-01
RNA-binding proteins (RBPs) have essential roles in RNA-mediated gene regulation, and yet annotation of RBPs is limited mainly to those with known RNA-binding domains. To systematically identify the RBPs of embryonic stem cells (ESCs), we here employ interactome capture, which combines UV cross-linking of RBP to RNA in living cells, oligo(dT) capture and MS. From mouse ESCs (mESCs), we have defined 555 proteins constituting the mESC mRNA interactome, including 283 proteins not previously annotated as RBPs. Of these, 68 new RBP candidates are highly expressed in ESCs compared to differentiated cells, implicating a role in stem-cell physiology. Two well-known E3 ubiquitin ligases, Trim25 (also called Efp) and Trim71 (also called Lin41), are validated as RBPs, revealing a potential link between RNA biology and protein-modification pathways. Our study confirms and expands the atlas of RBPs, providing a useful resource for the study of the RNA-RBP network in stem cells.
OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites.
Shazman, Shula; Lee, Hunjoong; Socol, Yakov; Mann, Richard S; Honig, Barry
2014-01-01
We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein-DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
Multi-scale chromatin state annotation using a hierarchical hidden Markov model
NASA Astrophysics Data System (ADS)
Marco, Eugenio; Meuleman, Wouter; Huang, Jialiang; Glass, Kimberly; Pinello, Luca; Wang, Jianrong; Kellis, Manolis; Yuan, Guo-Cheng
2017-04-01
Chromatin-state analysis is widely applied in the studies of development and diseases. However, existing methods operate at a single length scale, and therefore cannot distinguish large domains from isolated elements of the same type. To overcome this limitation, we present a hierarchical hidden Markov model, diHMM, to systematically annotate chromatin states at multiple length scales. We apply diHMM to analyse a public ChIP-seq data set. diHMM not only accurately captures nucleosome-level information, but identifies domain-level states that vary in nucleosome-level state composition, spatial distribution and functionality. The domain-level states recapitulate known patterns such as super-enhancers, bivalent promoters and Polycomb repressed regions, and identify additional patterns whose biological functions are not yet characterized. By integrating chromatin-state information with gene expression and Hi-C data, we identify context-dependent functions of nucleosome-level states. Thus, diHMM provides a powerful tool for investigating the role of higher-order chromatin structure in gene regulation.
PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors
Jin, Jinpu; Zhang, He; Kong, Lei; Gao, Ge; Luo, Jingchu
2014-01-01
With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences. PMID:24174544
GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites
Gao, Xinjiao; Ma, Qian; Ren, Jian; Xue, Yu
2011-01-01
As one of the most essential post-translational modifications (PTMs) of proteins, proteolysis, especially calpain-mediated cleavage, plays an important role in many biological processes, including cell death/apoptosis, cytoskeletal remodeling, and the cell cycle. Experimental identification of calpain targets with bona fide cleavage sites is fundamental for dissecting the molecular mechanisms and biological roles of calpain cleavage. In contrast to time-consuming and labor-intensive experimental approaches, computational prediction of calpain cleavage sites might more cheaply and readily provide useful information for further experimental investigation. In this work, we constructed a novel software package of GPS-CCD (Calpain Cleavage Detector) for the prediction of calpain cleavage sites, with an accuracy of 89.98%, sensitivity of 60.87% and specificity of 90.07%. With this software, we annotated potential calpain cleavage sites for hundreds of calpain substrates, for which the exact cleavage sites had not been previously determined. In this regard, GPS-CCD 1.0 is considered to be a useful tool for experimentalists. The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely available at: http://ccd.biocuckoo.org/. PMID:21533053
A high-level 3D visualization API for Java and ImageJ.
Schmid, Benjamin; Schindelin, Johannes; Cardona, Albert; Longair, Mark; Heisenberg, Martin
2010-05-21
Current imaging methods such as Magnetic Resonance Imaging (MRI), Confocal microscopy, Electron Microscopy (EM) or Selective Plane Illumination Microscopy (SPIM) yield three-dimensional (3D) data sets in need of appropriate computational methods for their analysis. The reconstruction, segmentation and registration are best approached from the 3D representation of the data set. Here we present a platform-independent framework based on Java and Java 3D for accelerated rendering of biological images. Our framework is seamlessly integrated into ImageJ, a free image processing package with a vast collection of community-developed biological image analysis tools. Our framework enriches the ImageJ software libraries with methods that greatly reduce the complexity of developing image analysis tools in an interactive 3D visualization environment. In particular, we provide high-level access to volume rendering, volume editing, surface extraction, and image annotation. The ability to rely on a library that removes the low-level details enables concentrating software development efforts on the algorithm implementation parts. Our framework enables biomedical image software development to be built with 3D visualization capabilities with very little effort. We offer the source code and convenient binary packages along with extensive documentation at http://3dviewer.neurofly.de.
Bhardwaj, Neeru; Montesion, Meagan; Roy, Farrah; Coffin, John M.
2015-01-01
Human endogenous retrovirus (HERV-K (HML-2)) proviruses are among the few endogenous retroviral elements in the human genome that retain coding sequence. HML-2 expression has been widely associated with human disease states, including different types of cancers as well as with HIV-1 infection. Understanding of the potential impact of this expression requires that it be annotated at the proviral level. Here, we utilized the high throughput capabilities of next-generation sequencing to profile HML-2 expression at the level of individual proviruses and secreted virions in the teratocarcinoma cell line Tera-1. We identified well-defined expression patterns, with transcripts emanating primarily from two proviruses located on chromosome 22, only one of which was efficiently packaged. Interestingly, there was a preference for transcripts of recently integrated proviruses, over those from other highly expressed but older elements, to be packaged into virions. We also assessed the promoter competence of the 5’ long terminal repeats (LTRs) of expressed proviruses via a luciferase assay following transfection of Tera-1 cells. Consistent with the RNASeq results, we found that the activity of most LTRs corresponded to their transcript levels. PMID:25746218
Blattmann, Peter; Heusel, Moritz; Aebersold, Ruedi
2016-01-01
SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.
Systemic bioinformatics analysis of skeletal muscle gene expression profiles of sepsis
Yang, Fang; Wang, Yumei
2018-01-01
Sepsis is a type of systemic inflammatory response syndrome with high morbidity and mortality. Skeletal muscle dysfunction is one of the major complications of sepsis that may also influence the outcome of sepsis. The aim of the present study was to explore and identify potential mechanisms and therapeutic targets of sepsis. Systemic bioinformatics analysis of skeletal muscle gene expression profiles from the Gene Expression Omnibus was performed. Differentially expressed genes (DEGs) in samples from patients with sepsis and control samples were screened out using the limma package. Differential co-expression and coregulation (DCE and DCR, respectively) analysis was performed based on the Differential Co-expression Analysis package to identify differences in gene co-expression and coregulation patterns between the control and sepsis groups. Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways of DEGs were identified using the Database for Annotation, Visualization and Integrated Discovery, and inflammatory, cancer and skeletal muscle development-associated biological processes and pathways were identified. DCE and DCR analysis revealed several potential therapeutic targets for sepsis, including genes and transcription factors. The results of the present study may provide a basis for the development of novel therapeutic targets and treatment methods for sepsis. PMID:29805480
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task
Arighi, Cecilia N.; Carterette, Ben; Cohen, K. Bretonnel; Krallinger, Martin; Wilbur, W. John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E.; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L.; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P.; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O.; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV. PMID:23327936
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.
Arighi, Cecilia N; Carterette, Ben; Cohen, K Bretonnel; Krallinger, Martin; Wilbur, W John; Fey, Petra; Dodson, Robert; Cooper, Laurel; Van Slyke, Ceri E; Dahdul, Wasila; Mabee, Paula; Li, Donghui; Harris, Bethany; Gillespie, Marc; Jimenez, Silvia; Roberts, Phoebe; Matthews, Lisa; Becker, Kevin; Drabkin, Harold; Bello, Susan; Licata, Luana; Chatr-aryamontri, Andrew; Schaeffer, Mary L; Park, Julie; Haendel, Melissa; Van Auken, Kimberly; Li, Yuling; Chan, Juancarlos; Muller, Hans-Michael; Cui, Hong; Balhoff, James P; Chi-Yang Wu, Johnny; Lu, Zhiyong; Wei, Chih-Hsuan; Tudor, Catalina O; Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar; Cejuela, Juan Miguel; Dubey, Pratibha; Wu, Cathy
2013-01-01
In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators' overall experience of a system, regardless of the system's high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.
NASA Astrophysics Data System (ADS)
de Carvalho, Fábio Romeu; Abe, Jair Minoro
2010-11-01
Two recent non-classical logics have been used to make decision: fuzzy logic and paraconsistent annotated evidential logic Et. In this paper we present a simplified version of the fuzzy decision method and its comparison with the paraconsistent one. Paraconsistent annotated evidential logic Et, introduced by Da Costa, Vago and Subrahmanian (1991), is capable of handling uncertain and contradictory data without becoming trivial. It has been used in many applications such as information technology, robotics, artificial intelligence, production engineering, decision making etc. Intuitively, one Et logic formula is type p(a, b), in which a and b belong to [0, 1] (real interval) and represent respectively the degree of favorable evidence (or degree of belief) and the degree of contrary evidence (or degree of disbelief) found in p. The set of all pairs (a; b), called annotations, when plotted, form the Cartesian Unitary Square (CUS). This set, containing a similar order relation of real number, comprises a network, called lattice of the annotations. Fuzzy logic was introduced by Zadeh (1965). It tries to systematize the knowledge study, searching mainly to study the fuzzy knowledge (you don't know what it means) and distinguish it from the imprecise one (you know what it means, but you don't know its exact value). This logic is similar to paraconsistent annotated one, since it attributes a numeric value (only one, not two values) to each proposition (then we can say that it is an one-valued logic). This number translates the intensity (the degree) with which the preposition is true. Let's X a set and A, a subset of X, identified by the function f(x). For each element x∈X, you have y = f(x)∈[0, 1]. The number y is called degree of pertinence of x in A. Decision making theories based on these logics have shown to be powerful in many aspects regarding more traditional methods, like the one based on Statistics. In this paper we present a first study for a simplified version of decision making theory based on Fuzzy Logic (SVMFD) and a comparison with the Paraconsistent Decision Method (PDM) based on Paraconsistent Annotated Evidential Logic Eτ, already presented and summarized in this paper. An example showing the two methods is presented in the paper, as well as a comparison between them.
Bigdely-Shamlo, Nima; Cockfield, Jeremy; Makeig, Scott; Rognon, Thomas; La Valle, Chris; Miyakoshi, Makoto; Robbins, Kay A.
2016-01-01
Real-world brain imaging by EEG requires accurate annotation of complex subject-environment interactions in event-rich tasks and paradigms. This paper describes the evolution of the Hierarchical Event Descriptor (HED) system for systematically describing both laboratory and real-world events. HED version 2, first described here, provides the semantic capability of describing a variety of subject and environmental states. HED descriptions can include stimulus presentation events on screen or in virtual worlds, experimental or spontaneous events occurring in the real world environment, and events experienced via one or multiple sensory modalities. Furthermore, HED 2 can distinguish between the mere presence of an object and its actual (or putative) perception by a subject. Although the HED framework has implicit ontological and linked data representations, the user-interface for HED annotation is more intuitive than traditional ontological annotation. We believe that hiding the formal representations allows for a more user-friendly interface, making consistent, detailed tagging of experimental, and real-world events possible for research users. HED is extensible while retaining the advantages of having an enforced common core vocabulary. We have developed a collection of tools to support HED tag assignment and validation; these are available at hedtags.org. A plug-in for EEGLAB (sccn.ucsd.edu/eeglab), CTAGGER, is also available to speed the process of tagging existing studies. PMID:27799907
eRAM: encyclopedia of rare disease annotations for precision medicine.
Jia, Jinmeng; An, Zhongxin; Ming, Yue; Guo, Yongli; Li, Wei; Liang, Yunxiang; Guo, Dongming; Li, Xin; Tai, Jun; Chen, Geng; Jin, Yaqiong; Liu, Zhimei; Ni, Xin; Shi, Tieliu
2018-01-04
Rare diseases affect over a hundred million people worldwide, most of these patients are not accurately diagnosed and effectively treated. The limited knowledge of rare diseases forms the biggest obstacle for improving their treatment. Detailed clinical phenotyping is considered as a keystone of deciphering genes and realizing the precision medicine for rare diseases. Here, we preset a standardized system for various types of rare diseases, called encyclopedia of Rare disease Annotations for Precision Medicine (eRAM). eRAM was built by text-mining nearly 10 million scientific publications and electronic medical records, and integrating various data in existing recognized databases (such as Unified Medical Language System (UMLS), Human Phenotype Ontology, Orphanet, OMIM, GWAS). eRAM systematically incorporates currently available data on clinical manifestations and molecular mechanisms of rare diseases and uncovers many novel associations among diseases. eRAM provides enriched annotations for 15 942 rare diseases, yielding 6147 human disease related phenotype terms, 31 661 mammalians phenotype terms, 10,202 symptoms from UMLS, 18 815 genes and 92 580 genotypes. eRAM can not only provide information about rare disease mechanism but also facilitate clinicians to make accurate diagnostic and therapeutic decisions towards rare diseases. eRAM can be freely accessed at http://www.unimd.org/eram/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
LipidPedia: a comprehensive lipid knowledgebase.
Kuo, Tien-Chueh; Tseng, Yufeng Jane
2018-04-10
Lipids are divided into fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, saccharolipids, sterols, prenol lipids and polyketides. Fatty acyls and glycerolipids are commonly used as energy storage, whereas glycerophospholipids, sphingolipids, sterols and saccharolipids are common used as components of cell membranes. Lipids in fatty acyls, glycerophospholipids, sphingolipids and sterols classes play important roles in signaling. Although more than 36 million lipids can be identified or computationally generated, no single lipid database provides comprehensive information on lipids. Furthermore, the complex systematic or common names of lipids make the discovery of related information challenging. Here, we present LipidPedia, a comprehensive lipid knowledgebase. The content of this database is derived from integrating annotation data with full-text mining of 3,923 lipids and more than 400,000 annotations of associated diseases, pathways, functions, and locations that are essential for interpreting lipid functions and mechanisms from over 1,400,000 scientific publications. Each lipid in LipidPedia also has its own entry containing a text summary curated from the most frequently cited diseases, pathways, genes, locations, functions, lipids and experimental models in the biomedical literature. LipidPedia aims to provide an overall synopsis of lipids to summarize lipid annotations and provide a detailed listing of references for understanding complex lipid functions and mechanisms. LipidPedia is available at http://lipidpedia.cmdm.tw. yjtseng@csie.ntu.edu.tw. Supplementary data are available at Bioinformatics online.
Raponi, Matteo; Damiani, Gianfranco; Vincenti, Sara; Wachocka, Malgorzata; Boninti, Federica; Bruno, Stefania; Quaranta, Gianluigi; Moscato, Umberto; Boccia, Stefania; Ficarra, Maria Giovanna; Specchia, Maria Lucia; Posteraro, Brunella; Berloco, Filippo; Celani, Fabrizio; Ricciardi, Walter; Laurenti, Patrizia
2014-01-01
The purpose of this research is to identify and formalize the Hospital Hygiene Service activities and products, evaluating them in a cost accounting management view. The ultimate aim, is to evaluate the financial adverse events prevention impact, in an Hospital Hygiene Service management. A three step methodology based on affinity grouping activities, was employed. This methodology led us to identify 4 action areas, with 23 related productive processes, and 86 available safety packages. Owing to this new methodology, we was able to implement a systematic evaluation of the furnished services.
DCGL v2.0: an R package for unveiling differential regulation from differential co-expression.
Yang, Jing; Yu, Hui; Liu, Bao-Hong; Zhao, Zhongming; Liu, Lei; Ma, Liang-Xiao; Li, Yi-Xue; Li, Yuan-Yuan
2013-01-01
Differential co-expression analysis (DCEA) has emerged in recent years as a novel, systematic investigation into gene expression data. While most DCEA studies or tools focus on the co-expression relationships among genes, some are developing a potentially more promising research domain, differential regulation analysis (DRA). In our previously proposed R package DCGL v1.0, we provided functions to facilitate basic differential co-expression analyses; however, the output from DCGL v1.0 could not be translated into differential regulation mechanisms in a straightforward manner. To advance from DCEA to DRA, we upgraded the DCGL package from v1.0 to v2.0. A new module named "Differential Regulation Analysis" (DRA) was designed, which consists of three major functions: DRsort, DRplot, and DRrank. DRsort selects differentially regulated genes (DRGs) and differentially regulated links (DRLs) according to the transcription factor (TF)-to-target information. DRrank prioritizes the TFs in terms of their potential relevance to the phenotype of interest. DRplot graphically visualizes differentially co-expressed links (DCLs) and/or TF-to-target links in a network context. In addition to these new modules, we streamlined the codes from v1.0. The evaluation results proved that our differential regulation analysis is able to capture the regulators relevant to the biological subject. With ample functions to facilitate differential regulation analysis, DCGL v2.0 was upgraded from a DCEA tool to a DRA tool, which may unveil the underlying differential regulation from the observed differential co-expression. DCGL v2.0 can be applied to a wide range of gene expression data in order to systematically identify novel regulators that have not yet been documented as critical. DCGL v2.0 package is available at http://cran.r-project.org/web/packages/DCGL/index.html or at our project home page http://lifecenter.sgst.cn/main/en/dcgl.jsp.
HIDE & SEEK: End-to-end packages to simulate and process radio survey data
NASA Astrophysics Data System (ADS)
Akeret, J.; Seehars, S.; Chang, C.; Monstein, C.; Amara, A.; Refregier, A.
2017-01-01
As several large single-dish radio surveys begin operation within the coming decade, a wealth of radio data will become available and provide a new window to the Universe. In order to fully exploit the potential of these datasets, it is important to understand the systematic effects associated with the instrument and the analysis pipeline. A common approach to tackle this is to forward-model the entire system-from the hardware to the analysis of the data products. For this purpose, we introduce two newly developed, open-source Python packages: the HI Data Emulator (HIDE) and the Signal Extraction and Emission Kartographer (SEEK) for simulating and processing single-dish radio survey data. HIDE forward-models the process of collecting astronomical radio signals in a single-dish radio telescope instrument and outputs pixel-level time-ordered-data. SEEK processes the time-ordered-data, removes artifacts from Radio Frequency Interference (RFI), automatically applies flux calibration, and aims to recover the astronomical radio signal. The two packages can be used separately or together depending on the application. Their modular and flexible nature allows easy adaptation to other instruments and datasets. We describe the basic architecture of the two packages and examine in detail the noise and RFI modeling in HIDE, as well as the implementation of gain calibration and RFI mitigation in SEEK. We then apply HIDE &SEEK to forward-model a Galactic survey in the frequency range 990-1260 MHz based on data taken at the Bleien Observatory. For this survey, we expect to cover 70% of the full sky and achieve a median signal-to-noise ratio of approximately 5-6 in the cleanest channels including systematic uncertainties. However, we also point out the potential challenges of high RFI contamination and baseline removal when examining the early data from the Bleien Observatory. The fully documented HIDE &SEEK packages are available at http://hideseek.phys.ethz.ch/ and are published under the GPLv3 license on GitHub.
Varet, Hugo; Brillet-Guéguen, Loraine; Coppée, Jean-Yves; Dillies, Marie-Agnès
2016-01-01
Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods. SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used. SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.
A systematic comparison between visual cues for boundary detection.
Mély, David A; Kim, Junkyung; McGill, Mason; Guo, Yuliang; Serre, Thomas
2016-03-01
The detection of object boundaries is a critical first step for many visual processing tasks. Multiple cues (we consider luminance, color, motion and binocular disparity) available in the early visual system may signal object boundaries but little is known about their relative diagnosticity and how to optimally combine them for boundary detection. This study thus aims at understanding how early visual processes inform boundary detection in natural scenes. We collected color binocular video sequences of natural scenes to construct a video database. Each scene was annotated with two full sets of ground-truth contours (one set limited to object boundaries and another set which included all edges). We implemented an integrated computational model of early vision that spans all considered cues, and then assessed their diagnosticity by training machine learning classifiers on individual channels. Color and luminance were found to be most diagnostic while stereo and motion were least. Combining all cues yielded a significant improvement in accuracy beyond that of any cue in isolation. Furthermore, the accuracy of individual cues was found to be a poor predictor of their unique contribution for the combination. This result suggested a complex interaction between cues, which we further quantified using regularization techniques. Our systematic assessment of the accuracy of early vision models for boundary detection together with the resulting annotated video dataset should provide a useful benchmark towards the development of higher-level models of visual processing. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.
Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B
2015-01-01
Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Mayhew, Alexandra J; Lock, Karen; Kelishadi, Roya; Swaminathan, Sumathi; Marcilio, Claudia S; Iqbal, Romaina; Dehghan, Mahshid; Yusuf, Salim; Chow, Clara K
2016-04-01
Food packages were objectively assessed to explore differences in nutrition labelling, selected promotional marketing techniques and health and nutrition claims between countries, in comparison to national regulations. Cross-sectional. Chip and sweet biscuit packages were collected from sixteen countries at different levels of economic development in the EPOCH (Environmental Profile of a Community's Health) study between 2008 and 2010. Seven hundred and thirty-seven food packages were systematically evaluated for nutrition labelling, selected promotional marketing techniques relevant to nutrition and health, and health and nutrition claims. We compared pack labelling in countries with labelling regulations, with voluntary regulations and no regulations. Overall 86 % of the packages had nutrition labels, 30 % had health or nutrition claims and 87 % displayed selected marketing techniques. On average, each package displayed two marketing techniques and one health or nutrition claim. In countries with mandatory nutrition labelling a greater proportion of packages displayed nutrition labels, had more of the seven required nutrients present, more total nutrients listed and higher readability compared with those with voluntary or no regulations. Countries with no health or nutrition claim regulations had fewer claims per package compared with countries with regulations. Nutrition label regulations were associated with increased prevalence and quality of nutrition labels. Health and nutrition claim regulations were unexpectedly associated with increased use of claims, suggesting that current regulations may not have the desired effect of protecting consumers. Of concern, lack of regulation was associated with increased promotional marketing techniques directed at children and misleadingly promoting broad concepts of health.
FastProject: a tool for low-dimensional analysis of single-cell RNA-Seq data.
DeTomaso, David; Yosef, Nir
2016-08-23
A key challenge in the emerging field of single-cell RNA-Seq is to characterize phenotypic diversity between cells and visualize this information in an informative manner. A common technique when dealing with high-dimensional data is to project the data to 2 or 3 dimensions for visualization. However, there are a variety of methods to achieve this result and once projected, it can be difficult to ascribe biological significance to the observed features. Additionally, when analyzing single-cell data, the relationship between cells can be obscured by technical confounders such as variable gene capture rates. To aid in the analysis and interpretation of single-cell RNA-Seq data, we have developed FastProject, a software tool which analyzes a gene expression matrix and produces a dynamic output report in which two-dimensional projections of the data can be explored. Annotated gene sets (referred to as gene 'signatures') are incorporated so that features in the projections can be understood in relation to the biological processes they might represent. FastProject provides a novel method of scoring each cell against a gene signature so as to minimize the effect of missed transcripts as well as a method to rank signature-projection pairings so that meaningful associations can be quickly identified. Additionally, FastProject is written with a modular architecture and designed to serve as a platform for incorporating and comparing new projection methods and gene selection algorithms. Here we present FastProject, a software package for two-dimensional visualization of single cell data, which utilizes a plethora of projection methods and provides a way to systematically investigate the biological relevance of these low dimensional representations by incorporating domain knowledge.
Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction
Névéol, Aurélie; Islamaj-Doğan, Rezarta; Lu, Zhiyong
2010-01-01
Information processing algorithms require significant amounts of annotated data for training and testing. The availability of such data is often hindered by the complexity and high cost of production. In this paper, we investigate the benefits of a state-of-the-art tool to help with the semantic annotation of a large set of biomedical information queries. Seven annotators were recruited to annotate a set of 10,000 PubMed® queries with 16 biomedical and bibliographic categories. About half of the queries were annotated from scratch, while the other half were automatically pre-annotated and manually corrected. The impact of the automatic pre-annotations was assessed on several aspects of the task: time, number of actions, annotator satisfaction, inter-annotator agreement, quality and number of the resulting annotations. The analysis of annotation results showed that the number of required hand annotations is 28.9% less when using pre-annotated results from automatic tools. As a result, the overall annotation time was substantially lower when pre-annotations were used, while inter-annotator agreement was significantly higher. In addition, there was no statistically significant difference in the semantic distribution or number of annotations produced when pre-annotations were used. The annotated query corpus is freely available to the research community. This study shows that automatic pre-annotations are found helpful by most annotators. Our experience suggests using an automatic tool to assist large-scale manual annotation projects. This helps speed-up the annotation time and improve annotation consistency while maintaining high quality of the final annotations. PMID:21094696
Assisted annotation of medical free text using RapTAT
Gobbel, Glenn T; Garvin, Jennifer; Reeves, Ruth; Cronin, Robert M; Heavirland, Julia; Williams, Jenifer; Weaver, Allison; Jayaramaraja, Shrimalini; Giuse, Dario; Speroff, Theodore; Brown, Steven H; Xu, Hua; Matheny, Michael E
2014-01-01
Objective To determine whether assisted annotation using interactive training can reduce the time required to annotate a clinical document corpus without introducing bias. Materials and methods A tool, RapTAT, was designed to assist annotation by iteratively pre-annotating probable phrases of interest within a document, presenting the annotations to a reviewer for correction, and then using the corrected annotations for further machine learning-based training before pre-annotating subsequent documents. Annotators reviewed 404 clinical notes either manually or using RapTAT assistance for concepts related to quality of care during heart failure treatment. Notes were divided into 20 batches of 19–21 documents for iterative annotation and training. Results The number of correct RapTAT pre-annotations increased significantly and annotation time per batch decreased by ∼50% over the course of annotation. Annotation rate increased from batch to batch for assisted but not manual reviewers. Pre-annotation F-measure increased from 0.5 to 0.6 to >0.80 (relative to both assisted reviewer and reference annotations) over the first three batches and more slowly thereafter. Overall inter-annotator agreement was significantly higher between RapTAT-assisted reviewers (0.89) than between manual reviewers (0.85). Discussion The tool reduced workload by decreasing the number of annotations needing to be added and helping reviewers to annotate at an increased rate. Agreement between the pre-annotations and reference standard, and agreement between the pre-annotations and assisted annotations, were similar throughout the annotation process, which suggests that pre-annotation did not introduce bias. Conclusions Pre-annotations generated by a tool capable of interactive training can reduce the time required to create an annotated document corpus by up to 50%. PMID:24431336
SpcAudace: Spectroscopic processing and analysis package of Audela software
NASA Astrophysics Data System (ADS)
Mauclaire, Benjamin
2017-11-01
SpcAudace processes long slit spectra with automated pipelines and performs astrophysical analysis of the latter data. These powerful pipelines do all the required steps in one pass: standard preprocessing, masking of bad pixels, geometric corrections, registration, optimized spectrum extraction, wavelength calibration and instrumental response computation and correction. Both high and low resolution long slit spectra are managed for stellar and non-stellar targets. Many types of publication-quality figures can be easily produced: pdf and png plots or annotated time series plots. Astrophysical quantities can be derived from individual or large amount of spectra with advanced functions: from line profile characteristics to equivalent width and periodogram. More than 300 documented functions are available and can be used into TCL scripts for automation. SpcAudace is based on Audela open source software.
Systematic Review of Video-Based Instruction Component and Parametric Analyses
ERIC Educational Resources Information Center
Bennett, Kyle D.; Aljehany, Mashal Salman; Altaf, Enas Mohammednour
2017-01-01
Video-based instruction (VBI) has a substantial amount of research supporting its use with individuals with autism spectrum disorder and other developmental disabilities. However, it has typically been implemented as a treatment package containing multiple interventions. Additionally, there are procedural variations of VBI. Thus, it is difficult…
Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert
2012-01-01
In recent years, considerable advances have been made toward our understanding of the genetic architecture of behavior and the physical, mental, and environmental influences that underpin behavioral processes. The provision of a method for recording behavior-related phenomena is necessary to enable integrative and comparative analyses of data and knowledge about behavior. The neurobehavior ontology facilitates the systematic representation of behavior and behavioral phenotypes, thereby improving the unification and integration behavioral data in neuroscience research. Copyright © 2012 Elsevier Inc. All rights reserved.
2010-01-01
Background Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. Results The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of translators required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Conclusions Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: http://pypi.python.org/pypi/rpy2-bioconductor-extensions/ PMID:21210978
TSSi--an R package for transcription start site identification from 5' mRNA tag data.
Kreutz, C; Gehring, J S; Lang, D; Reski, R; Timmer, J; Rensing, S A
2012-06-15
High-throughput sequencing has become an essential experimental approach for the investigation of transcriptional mechanisms. For some applications like ChIP-seq, several approaches for the prediction of peak locations exist. However, these methods are not designed for the identification of transcription start sites (TSSs) because such datasets contain qualitatively different noise. In this application note, the R package TSSi is presented which provides a heuristic framework for the identification of TSSs based on 5' mRNA tag data. Probabilistic assumptions for the distribution of the data, i.e. for the observed positions of the mapped reads, as well as for systematic errors, i.e. for reads which map closely but not exactly to a real TSS, are made and can be adapted by the user. The framework also comprises a regularization procedure which can be applied as a preprocessing step to decrease the noise and thereby reduce the number of false predictions. The R package TSSi is available from the Bioconductor web site: www.bioconductor.org/packages/release/bioc/html/TSSi.html.
PyPathway: Python Package for Biological Network Analysis and Visualization.
Xu, Yang; Luo, Xiao-Chun
2018-05-01
Life science studies represent one of the biggest generators of large data sets, mainly because of rapid sequencing technological advances. Biological networks including interactive networks and human curated pathways are essential to understand these high-throughput data sets. Biological network analysis offers a method to explore systematically not only the molecular complexity of a particular disease but also the molecular relationships among apparently distinct phenotypes. Currently, several packages for Python community have been developed, such as BioPython and Goatools. However, tools to perform comprehensive network analysis and visualization are still needed. Here, we have developed PyPathway, an extensible free and open source Python package for functional enrichment analysis, network modeling, and network visualization. The network process module supports various interaction network and pathway databases such as Reactome, WikiPathway, STRING, and BioGRID. The network analysis module implements overrepresentation analysis, gene set enrichment analysis, network-based enrichment, and de novo network modeling. Finally, the visualization and data publishing modules enable users to share their analysis by using an easy web application. For package availability, see the first Reference.
SynergyFinder: a web application for analyzing drug combination dose-response matrix data.
Ianevski, Aleksandr; He, Liye; Aittokallio, Tero; Tang, Jing
2017-08-01
Rational design of drug combinations has become a promising strategy to tackle the drug sensitivity and resistance problem in cancer treatment. To systematically evaluate the pre-clinical significance of pairwise drug combinations, functional screening assays that probe combination effects in a dose-response matrix assay are commonly used. To facilitate the analysis of such drug combination experiments, we implemented a web application that uses key functions of R-package SynergyFinder, and provides not only the flexibility of using multiple synergy scoring models, but also a user-friendly interface for visualizing the drug combination landscapes in an interactive manner. The SynergyFinder web application is freely accessible at https://synergyfinder.fimm.fi ; The R-package and its source-code are freely available at http://bioconductor.org/packages/release/bioc/html/synergyfinder.html . jing.tang@helsinki.fi. © The Author(s) 2017. Published by Oxford University Press.
Xie, Bo; Xing, Yonghao; Wang, Yanshuang; Chen, Jian; Chen, Deyong; Wang, Junbo
2015-09-21
This paper presents the fabrication and characterization of a resonant pressure microsensor based on SOI-glass wafer-level vacuum packaging. The SOI-based pressure microsensor consists of a pressure-sensitive diaphragm at the handle layer and two lateral resonators (electrostatic excitation and capacitive detection) on the device layer as a differential setup. The resonators were vacuum packaged with a glass cap using anodic bonding and the wire interconnection was realized using a mask-free electrochemical etching approach by selectively patterning an Au film on highly topographic surfaces. The fabricated resonant pressure microsensor with dual resonators was characterized in a systematic manner, producing a quality factor higher than 10,000 (~6 months), a sensitivity of about 166 Hz/kPa and a reduced nonlinear error of 0.033% F.S. Based on the differential output, the sensitivity was increased to two times and the temperature-caused frequency drift was decreased to 25%.
A Lateral Differential Resonant Pressure Microsensor Based on SOI-Glass Wafer-Level Vacuum Packaging
Xie, Bo; Xing, Yonghao; Wang, Yanshuang; Chen, Jian; Chen, Deyong; Wang, Junbo
2015-01-01
This paper presents the fabrication and characterization of a resonant pressure microsensor based on SOI-glass wafer-level vacuum packaging. The SOI-based pressure microsensor consists of a pressure-sensitive diaphragm at the handle layer and two lateral resonators (electrostatic excitation and capacitive detection) on the device layer as a differential setup. The resonators were vacuum packaged with a glass cap using anodic bonding and the wire interconnection was realized using a mask-free electrochemical etching approach by selectively patterning an Au film on highly topographic surfaces. The fabricated resonant pressure microsensor with dual resonators was characterized in a systematic manner, producing a quality factor higher than 10,000 (~6 months), a sensitivity of about 166 Hz/kPa and a reduced nonlinear error of 0.033% F.S. Based on the differential output, the sensitivity was increased to two times and the temperature-caused frequency drift was decreased to 25%. PMID:26402679
Release of (and lessons learned from mining) a pioneering large toxicogenomics database.
Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R
2015-07-01
We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
Wang, Xiupin; Peng, Qingzhi; Li, Peiwu; Zhang, Qi; Ding, Xiaoxia; Zhang, Wen; Zhang, Liangxiao
2016-10-12
High complexity of identification for non-target triacylglycerols (TAGs) is a major challenge in lipidomics analysis. To identify non-target TAGs, a powerful tool named accurate MS(n) spectrometry generating so-called ion trees is used. In this paper, we presented a technique for efficient structural elucidation of TAGs on MS(n) spectral trees produced by LTQ Orbitrap MS(n), which was implemented as an open source software package, or TIT. The TIT software was used to support automatic annotation of non-target TAGs on MS(n) ion trees from a self-built fragment ion database. This database includes 19108 simulate TAG molecules from a random combination of fatty acids and corresponding 500582 self-built multistage fragment ions (MS ≤ 3). Our software can identify TAGs using a "stage-by-stage elimination" strategy. By utilizing the MS(1) accurate mass and referenced RKMD, the TIT software can discriminate unique elemental composition candidates. The regiospecific isomers of fatty acyl chains will be distinguished using MS(2) and MS(3) fragment spectra. We applied the algorithm to the selection of 45 TAG standards and demonstrated that the molecular ions could be 100% correctly assigned. Therefore, the TIT software could be applied to TAG identification in complex biological samples such as mouse plasma extracts. Copyright © 2016 Elsevier B.V. All rights reserved.
Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude
2011-06-20
One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
2011-01-01
Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins. PMID:21689388
Hatchard, Jenny L; Fooks, Gary J; Gilmore, Anna B
2016-01-01
Objectives To investigate opposition to standardised tobacco packaging in the UK. To increase understanding of how transnational corporations are adapting to changes in their access to policymakers precipitated by Article 5.3 of the Framework Convention on Tobacco Control (FCTC). Design Case study web-based documentary analysis, using NVivo V.10. Examination of relationships between opponents of standardised packaging and transnational tobacco companies (TTCs) and of the volume, nature, transparency and timing of their activities. Setting UK standardised packaging policy debate 2011–2013. Participants Organisations selected on basis of opposition to, or facilitation thereof, standardised tobacco packaging in the UK; 422 associated documents. Results Excluding tobacco manufacturing and packaging companies (n=12), 109 organisations were involved in opposing standardised packaging, 82 (75%) of which had a financial relationship with 1 or more TTC. These 82 organisations (43 actively opposing the measure, 39 facilitating opposition) were responsible for 60% of the 404 activities identified, including the majority of public communications and research production. TTCs were directly responsible for 28% of total activities, predominantly direct lobbying, but also financially underwrote third party research, communication, mass recruitment and lobbying. Active organisations rarely reported any financial relationship with TTCs when undertaking opposition activities. Conclusions The multifaceted opposition to standardised packaging was primarily undertaken by third parties with financial relationships with major tobacco manufacturers. Low levels of transparency regarding these links created a misleading impression of diverse and widespread opposition. Countries should strengthen implementation of Article 5.3 of the FCTC by systematically requiring conflict of interest declarations from all organisations participating in political or media debates on tobacco control. PMID:27855104
Hatchard, Jenny L; Fooks, Gary J; Gilmore, Anna B
2016-10-07
To investigate opposition to standardised tobacco packaging in the UK. To increase understanding of how transnational corporations are adapting to changes in their access to policymakers precipitated by Article 5.3 of the Framework Convention on Tobacco Control (FCTC). Case study web-based documentary analysis, using NVivo V.10. Examination of relationships between opponents of standardised packaging and transnational tobacco companies (TTCs) and of the volume, nature, transparency and timing of their activities. UK standardised packaging policy debate 2011-2013. Organisations selected on basis of opposition to, or facilitation thereof, standardised tobacco packaging in the UK; 422 associated documents. Excluding tobacco manufacturing and packaging companies (n=12), 109 organisations were involved in opposing standardised packaging, 82 (75%) of which had a financial relationship with 1 or more TTC. These 82 organisations (43 actively opposing the measure, 39 facilitating opposition) were responsible for 60% of the 404 activities identified, including the majority of public communications and research production. TTCs were directly responsible for 28% of total activities, predominantly direct lobbying, but also financially underwrote third party research, communication, mass recruitment and lobbying. Active organisations rarely reported any financial relationship with TTCs when undertaking opposition activities. The multifaceted opposition to standardised packaging was primarily undertaken by third parties with financial relationships with major tobacco manufacturers. Low levels of transparency regarding these links created a misleading impression of diverse and widespread opposition. Countries should strengthen implementation of Article 5.3 of the FCTC by systematically requiring conflict of interest declarations from all organisations participating in political or media debates on tobacco control. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Fernandes, Michelle; Stein, Alan; Newton, Charles R.; Cheikh-Ismail, Leila; Kihara, Michael; Wulff, Katharina; de León Quintana, Enrique; Aranzeta, Luis; Soria-Frisch, Aureli; Acedo, Javier; Ibanez, David; Abubakar, Amina; Giuliani, Francesca; Lewis, Tamsin; Kennedy, Stephen; Villar, Jose
2014-01-01
Background The International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st) Project is a population-based, longitudinal study describing early growth and development in an optimally healthy cohort of 4607 mothers and newborns. At 24 months, children are assessed for neurodevelopmental outcomes with the INTERGROWTH-21st Neurodevelopment Package. This paper describes neurodevelopment tools for preschoolers and the systematic approach leading to the development of the Package. Methods An advisory panel shortlisted project-specific criteria (such as multi-dimensional assessments and suitability for international populations) to be fulfilled by a neurodevelopment instrument. A literature review of well-established tools for preschoolers revealed 47 candidates, none of which fulfilled all the project's criteria. A multi-dimensional assessment was, therefore, compiled using a package-based approach by: (i) categorizing desired outcomes into domains, (ii) devising domain-specific criteria for tool selection, and (iii) selecting the most appropriate measure for each domain. Results The Package measures vision (Cardiff tests); cortical auditory processing (auditory evoked potentials to a novelty oddball paradigm); and cognition, language skills, behavior, motor skills and attention (the INTERGROWTH-21st Neurodevelopment Assessment) in 35–45 minutes. Sleep-wake patterns (actigraphy) are also assessed. Tablet-based applications with integrated quality checks and automated, wireless electroencephalography make the Package easy to administer in the field by non-specialist staff. The Package is in use in Brazil, India, Italy, Kenya and the United Kingdom. Conclusions The INTERGROWTH-21st Neurodevelopment Package is a multi-dimensional instrument measuring early child development (ECD). Its developmental approach may be useful to those involved in large-scale ECD research and surveillance efforts. PMID:25423589
Community annotation experiment for ground truth generation for the i2b2 medication challenge
Solti, Imre; Xia, Fei; Cadag, Eithon
2010-01-01
Objective Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as ‘the i2b2 medication challenge team’ or ‘the i2b2 team’ for short) organized a community annotation experiment. Design For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They asked the participants of the Third i2b2 Workshop to annotate 10 discharge summaries per person; each discharge summary was annotated by two annotators from two different teams, and a third annotator from a third team resolved disagreements. Measurements In order to evaluate the reliability of the annotations thus produced, the authors measured community inter-annotator agreement and compared it with the inter-annotator agreement of expert annotators when both the community and the expert annotators generated ground truth based on pooled system outputs. For this purpose, the pool consisted of the three most densely populated automatic annotations of each record. The authors also compared the community inter-annotator agreement with expert inter-annotator agreement when the experts annotated raw records without using the pool. Finally, they measured the quality of the community ground truth by comparing it with the expert ground truth. Results and conclusions The authors found that the community annotators achieved comparable inter-annotator agreement to expert annotators, regardless of whether the experts annotated from the pool. Furthermore, the ground truth generated by the community obtained F-measures above 0.90 against the ground truth of the experts, indicating the value of the community as a source of high-quality ground truth even on intricate and domain-specific annotation tasks. PMID:20819855
GneimoSim: A Modular Internal Coordinates Molecular Dynamics Simulation Package
Larsen, Adrien B.; Wagner, Jeffrey R.; Kandel, Saugat; Salomon-Ferrer, Romelia; Vaidehi, Nagarajan; Jain, Abhinandan
2014-01-01
The Generalized Newton Euler Inverse Mass Operator (GNEIMO) method is an advanced method for internal coordinates molecular dynamics (ICMD). GNEIMO includes several theoretical and algorithmic advancements that address longstanding challenges with ICMD simulations. In this paper we describe the GneimoSim ICMD software package that implements the GNEIMO method. We believe that GneimoSim is the first software package to include advanced features such as the equipartition principle derived for internal coordinates, and a method for including the Fixman potential to eliminate systematic statistical biases introduced by the use of hard constraints. Moreover, by design, GneimoSim is extensible and can be easily interfaced with third party force field packages for ICMD simulations. Currently, GneimoSim includes interfaces to LAMMPS, OpenMM, Rosetta force field calculation packages. The availability of a comprehensive Python interface to the underlying C++ classes and their methods provides a powerful and versatile mechanism for users to develop simulation scripts to configure the simulation and control the simulation flow. GneimoSim has been used extensively for studying the dynamics of protein structures, refinement of protein homology models, and for simulating large scale protein conformational changes with enhanced sampling methods. GneimoSim is not limited to proteins and can also be used for the simulation of polymeric materials. PMID:25263538
Community-based benchmarking of the CMIP DECK experiments
NASA Astrophysics Data System (ADS)
Gleckler, P. J.
2015-12-01
A diversity of community-based efforts are independently developing "diagnostic packages" with little or no coordination between them. A short list of examples include NCAR's Climate Variability Diagnostics Package (CVDP), ORNL's International Land Model Benchmarking (ILAMB), LBNL's Toolkit for Extreme Climate Analysis (TECA), PCMDI's Metrics Package (PMP), the EU EMBRACE ESMValTool, the WGNE MJO diagnostics package, and CFMIP diagnostics. The full value of these efforts cannot be realized without some coordination. As a first step, a WCRP effort has initiated a catalog to document candidate packages that could potentially be applied in a "repeat-use" fashion to all simulations contributed to the CMIP DECK (Diagnostic, Evaluation and Characterization of Klima) experiments. Some coordination of community-based diagnostics has the additional potential to improve how CMIP modeling groups analyze their simulations during model-development. The fact that most modeling groups now maintain a "CMIP compliant" data stream means that in principal without much effort they could readily adopt a set of well organized diagnostic capabilities specifically designed to operate on CMIP DECK experiments. Ultimately, a detailed listing of and access to analysis codes that are demonstrated to work "out of the box" with CMIP data could enable model developers (and others) to select those codes they wish to implement in-house, potentially enabling more systematic evaluation during the model development process.
GneimoSim: a modular internal coordinates molecular dynamics simulation package.
Larsen, Adrien B; Wagner, Jeffrey R; Kandel, Saugat; Salomon-Ferrer, Romelia; Vaidehi, Nagarajan; Jain, Abhinandan
2014-12-05
The generalized Newton-Euler inverse mass operator (GNEIMO) method is an advanced method for internal coordinates molecular dynamics (ICMD). GNEIMO includes several theoretical and algorithmic advancements that address longstanding challenges with ICMD simulations. In this article, we describe the GneimoSim ICMD software package that implements the GNEIMO method. We believe that GneimoSim is the first software package to include advanced features such as the equipartition principle derived for internal coordinates, and a method for including the Fixman potential to eliminate systematic statistical biases introduced by the use of hard constraints. Moreover, by design, GneimoSim is extensible and can be easily interfaced with third party force field packages for ICMD simulations. Currently, GneimoSim includes interfaces to LAMMPS, OpenMM, and Rosetta force field calculation packages. The availability of a comprehensive Python interface to the underlying C++ classes and their methods provides a powerful and versatile mechanism for users to develop simulation scripts to configure the simulation and control the simulation flow. GneimoSim has been used extensively for studying the dynamics of protein structures, refinement of protein homology models, and for simulating large scale protein conformational changes with enhanced sampling methods. GneimoSim is not limited to proteins and can also be used for the simulation of polymeric materials. © 2014 Wiley Periodicals, Inc.
De Coi, Niccolò; Feuermann, Marc; Schmid-Siegert, Emanuel; Băguţ, Elena-Tatiana; Mignon, Bernard; Waridel, Patrice; Peter, Corinne; Pradervand, Sylvain
2016-01-01
ABSTRACT Dermatophytes are the most common agents of superficial mycoses in humans and animals. The aim of the present investigation was to systematically identify the extracellular, possibly secreted, proteins that are putative virulence factors and antigenic molecules of dermatophytes. A complete gene expression profile of Arthroderma benhamiae was obtained during infection of its natural host (guinea pig) using RNA sequencing (RNA-seq) technology. This profile was completed with those of the fungus cultivated in vitro in two media containing either keratin or soy meal protein as the sole source of nitrogen and in Sabouraud medium. More than 60% of transcripts deduced from RNA-seq data differ from those previously deposited for A. benhamiae. Using these RNA-seq data along with an automatic gene annotation procedure, followed by manual curation, we produced a new annotation of the A. benhamiae genome. This annotation comprised 7,405 coding sequences (CDSs), among which only 2,662 were identical to the currently available annotation, 383 were newly identified, and 15 secreted proteins were manually corrected. The expression profile of genes encoding proteins with a signal peptide in infected guinea pigs was found to be very different from that during in vitro growth when using keratin as the substrate. Especially, the sets of the 12 most highly expressed genes encoding proteases with a signal sequence had only the putative vacuolar aspartic protease gene PEP2 in common, during infection and in keratin medium. The most upregulated gene encoding a secreted protease during infection was that encoding subtilisin SUB6, which is a known major allergen in the related dermatophyte Trichophyton rubrum. IMPORTANCE Dermatophytoses (ringworm, jock itch, athlete’s foot, and nail infections) are the most common fungal infections, but their virulence mechanisms are poorly understood. Combining transcriptomic data obtained from growth under various culture conditions with data obtained during infection led to a significantly improved genome annotation. About 65% of the protein-encoding genes predicted with our protocol did not match the existing annotation for A. benhamiae. Comparing gene expression during infection on guinea pigs with keratin degradation in vitro, which is supposed to mimic the host environment, revealed the critical importance of using real in vivo conditions for investigating virulence mechanisms. The analysis of genes expressed in vivo, encoding cell surface and secreted proteins, particularly proteases, led to the identification of new allergen and virulence factor candidates. PMID:27822542
Tran, Van Du T; De Coi, Niccolò; Feuermann, Marc; Schmid-Siegert, Emanuel; Băguţ, Elena-Tatiana; Mignon, Bernard; Waridel, Patrice; Peter, Corinne; Pradervand, Sylvain; Pagni, Marco; Monod, Michel
2016-01-01
Dermatophytes are the most common agents of superficial mycoses in humans and animals. The aim of the present investigation was to systematically identify the extracellular, possibly secreted, proteins that are putative virulence factors and antigenic molecules of dermatophytes. A complete gene expression profile of Arthroderma benhamiae was obtained during infection of its natural host (guinea pig) using RNA sequencing (RNA-seq) technology. This profile was completed with those of the fungus cultivated in vitro in two media containing either keratin or soy meal protein as the sole source of nitrogen and in Sabouraud medium. More than 60% of transcripts deduced from RNA-seq data differ from those previously deposited for A. benhamiae . Using these RNA-seq data along with an automatic gene annotation procedure, followed by manual curation, we produced a new annotation of the A. benhamiae genome. This annotation comprised 7,405 coding sequences (CDSs), among which only 2,662 were identical to the currently available annotation, 383 were newly identified, and 15 secreted proteins were manually corrected. The expression profile of genes encoding proteins with a signal peptide in infected guinea pigs was found to be very different from that during in vitro growth when using keratin as the substrate. Especially, the sets of the 12 most highly expressed genes encoding proteases with a signal sequence had only the putative vacuolar aspartic protease gene PEP2 in common, during infection and in keratin medium. The most upregulated gene encoding a secreted protease during infection was that encoding subtilisin SUB6, which is a known major allergen in the related dermatophyte Trichophyton rubrum . IMPORTANCE Dermatophytoses (ringworm, jock itch, athlete's foot, and nail infections) are the most common fungal infections, but their virulence mechanisms are poorly understood. Combining transcriptomic data obtained from growth under various culture conditions with data obtained during infection led to a significantly improved genome annotation. About 65% of the protein-encoding genes predicted with our protocol did not match the existing annotation for A. benhamiae . Comparing gene expression during infection on guinea pigs with keratin degradation in vitro , which is supposed to mimic the host environment, revealed the critical importance of using real in vivo conditions for investigating virulence mechanisms. The analysis of genes expressed in vivo , encoding cell surface and secreted proteins, particularly proteases, led to the identification of new allergen and virulence factor candidates.
Genome-wide characterization of monomeric transcriptional regulators in Mycobacterium tuberculosis.
Feng, Lipeng; Chen, Zhenkang; Wang, Zhongwei; Hu, Yangbo; Chen, Shiyun
2016-05-01
Gene transcription catalysed by RNA polymerase is regulated by transcriptional regulators, which play central roles in the control of gene transcription in both eukaryotes and prokaryotes. In regulating gene transcription, many regulators form dimers that bind to DNA with repeated motifs. However, some regulators function as monomers, but their mechanisms of gene expression control are largely uncharacterized. Here we systematically characterized monomeric versus dimeric regulators in the tuberculosis causative agent Mycobacterium tuberculosis. Of the >160 transcriptional regulators annotated in M. tuberculosis, 154 transcriptional regulators were tested, 22 % probably act as monomers and most are annotated as hypothetical regulators. Notably, all members of the WhiB-like protein family are classified as monomers. To further investigate mechanisms of monomeric regulators, we analysed the actions of these WhiB proteins and found that the majority interact with the principal sigma factor σA, which is also a monomeric protein within the RNA polymerase holoenzyme. Taken together, our study for the first time globally classified monomeric regulators in M. tuberculosis and suggested a mechanism for monomeric regulators in controlling gene transcription through interacting with monomeric sigma factors.
PlantTFDB: a comprehensive plant transcription factor database
Guo, An-Yuan; Chen, Xin; Gao, Ge; Zhang, He; Zhu, Qi-Hui; Liu, Xiao-Chuan; Zhong, Ying-Fu; Gu, Xiaocheng; He, Kun; Luo, Jingchu
2008-01-01
Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26 402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis. PMID:17933783
LMSD: LIPID MAPS structure database
Sud, Manish; Fahy, Eoin; Cotter, Dawn; Brown, Alex; Dennis, Edward A.; Glass, Christopher K.; Merrill, Alfred H.; Murphy, Robert C.; Raetz, Christian R. H.; Russell, David W.; Subramaniam, Shankar
2007-01-01
The LIPID MAPS Structure Database (LMSD) is a relational database encompassing structures and annotations of biologically relevant lipids. Structures of lipids in the database come from four sources: (i) LIPID MAPS Consortium's core laboratories and partners; (ii) lipids identified by LIPID MAPS experiments; (iii) computationally generated structures for appropriate lipid classes; (iv) biologically relevant lipids manually curated from LIPID BANK, LIPIDAT and other public sources. All the lipid structures in LMSD are drawn in a consistent fashion. In addition to a classification-based retrieval of lipids, users can search LMSD using either text-based or structure-based search options. The text-based search implementation supports data retrieval by any combination of these data fields: LIPID MAPS ID, systematic or common name, mass, formula, category, main class, and subclass data fields. The structure-based search, in conjunction with optional data fields, provides the capability to perform a substructure search or exact match for the structure drawn by the user. Search results, in addition to structure and annotations, also include relevant links to external databases. The LMSD is publicly available at PMID:17098933
A geographically-diverse collection of 418 human gut microbiome pathway genome databases
Hahn, Aria S.; Altman, Tomer; Konwar, Kishori M.; Hanson, Niels W.; Kim, Dongjae; Relman, David A.; Dill, David L.; Hallam, Steven J.
2017-01-01
Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools. PMID:28398290
USDA-ARS?s Scientific Manuscript database
The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...
Modes of Meaning in High School Science. CELA Research Report.
ERIC Educational Resources Information Center
Young, Richard F.; Nguyen, Hanh Thi
Using the framework of systematic functional grammar, this study compares two modes of presenting the same scientific topic: in a physics textbook and in interactive teacher talk. Three aspects of scientific meaning making are analyzed: representations of physical and mental reality, lexical packaging, and the rhetorical structure of reasoning.…
Component Analyses Using Single-Subject Experimental Designs: A Review
ERIC Educational Resources Information Center
Ward-Horner, John; Sturmey, Peter
2010-01-01
A component analysis is a systematic assessment of 2 or more independent variables or components that comprise a treatment package. Component analyses are important for the analysis of behavior; however, previous research provides only cursory descriptions of the topic. Therefore, in this review the definition of "component analysis" is discussed,…
Multicompetence and Native Speaker Variation in Clausal Packaging in Japanese
ERIC Educational Resources Information Center
Brown, Amanda; Gullberg, Marianne
2012-01-01
Native speakers show systematic variation in a range of linguistic domains as a function of a variety of sociolinguistic variables. This article addresses native language variation in the context of multicompetence, i.e. knowledge of two languages in one mind (Cook, 1991). Descriptions of motion were elicited from functionally monolingual and…
Implementation of GenePattern within the Stanford Microarray Database.
Hubble, Jeremy; Demeter, Janos; Jin, Heng; Mao, Maria; Nitzberg, Michael; Reddy, T B K; Wymore, Farrell; Zachariah, Zachariah K; Sherlock, Gavin; Ball, Catherine A
2009-01-01
Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.
Carrell, David S; Cronkite, David J; Malin, Bradley A; Aberdeen, John S; Hirschman, Lynette
2016-08-05
Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized. This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size. Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation. Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator. Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.
Active learning reduces annotation time for clinical concept extraction.
Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony
2017-10-01
To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.
Corpus annotation for mining biomedical events from literature
Kim, Jin-Dong; Ohta, Tomoko; Tsujii, Jun'ichi
2008-01-01
Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain. PMID:18182099
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST)
Dowd, Scot E; Zaragoza, Joaquin; Rodriguez, Javier R; Oliver, Melvin J; Payton, Paxton R
2005-01-01
Background BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality. Results W.ND-BLAST is a comprehensive Windows-based software toolkit that targets researchers, including those with minimal computer skills, and provides the ability increase the performance of BLAST by distributing BLAST queries to any number of Windows based machines across local area networks (LAN). W.ND-BLAST provides intuitive Graphic User Interfaces (GUI) for BLAST database creation, BLAST execution, BLAST output evaluation and BLAST result exportation. This software also provides several layers of fault tolerance and fault recovery to prevent loss of data if nodes or master machines fail. This paper lays out the functionality of W.ND-BLAST. W.ND-BLAST displays close to 100% performance efficiency when distributing tasks to 12 remote computers of the same performance class. A high throughput BLAST job which took 662.68 minutes (11 hours) on one average machine was completed in 44.97 minutes when distributed to 17 nodes, which included lower performance class machines. Finally, there is a comprehensive high-throughput BLAST Output Viewer (BOV) and Annotation Engine components, which provides comprehensive exportation of BLAST hits to text files, annotated fasta files, tables, or association files. Conclusion W.ND-BLAST provides an interactive tool that allows scientists to easily utilizing their available computing resources for high throughput and comprehensive sequence analyses. The install package for W.ND-BLAST is freely downloadable from . With registration the software is free, installation, networking, and usage instructions are provided as well as a support forum. PMID:15819992
Gardner, Shea N.; Hall, Barry G.
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125
Gardner, Shea N; Hall, Barry G
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
DGCA: A comprehensive R package for Differential Gene Correlation Analysis.
McKenzie, Andrew T; Katsyv, Igor; Song, Won-Min; Wang, Minghui; Zhang, Bin
2016-11-15
Dissecting the regulatory relationships between genes is a critical step towards building accurate predictive models of biological systems. A powerful approach towards this end is to systematically study the differences in correlation between gene pairs in more than one distinct condition. In this study we develop an R package, DGCA (for Differential Gene Correlation Analysis), which offers a suite of tools for computing and analyzing differential correlations between gene pairs across multiple conditions. To minimize parametric assumptions, DGCA computes empirical p-values via permutation testing. To understand differential correlations at a systems level, DGCA performs higher-order analyses such as measuring the average difference in correlation and multiscale clustering analysis of differential correlation networks. Through a simulation study, we show that the straightforward z-score based method that DGCA employs significantly outperforms the existing alternative methods for calculating differential correlation. Application of DGCA to the TCGA RNA-seq data in breast cancer not only identifies key changes in the regulatory relationships between TP53 and PTEN and their target genes in the presence of inactivating mutations, but also reveals an immune-related differential correlation module that is specific to triple negative breast cancer (TNBC). DGCA is an R package for systematically assessing the difference in gene-gene regulatory relationships under different conditions. This user-friendly, effective, and comprehensive software tool will greatly facilitate the application of differential correlation analysis in many biological studies and thus will help identification of novel signaling pathways, biomarkers, and targets in complex biological systems and diseases.
Electronic medication packaging devices and medication adherence: a systematic review.
Checchi, Kyle D; Huybrechts, Krista F; Avorn, Jerry; Kesselheim, Aaron S
2014-09-24
Medication nonadherence, which has been estimated to affect 28% to 31% of US patients with hypertension, hyperlipidemia, and diabetes, may be improved by electronic medication packaging (EMP) devices (adherence-monitoring devices incorporated into the packaging of a prescription medication). To investigate whether EMP devices are associated with improved adherence and to identify and describe common features of EMP devices. Systematic review of peer-reviewed studies testing the effectiveness of EMP systems in the MEDLINE, EMBASE, PsycINFO, CINAHL, International Pharmaceutical Abstracts, and Sociological Abstracts databases from searches conducted to June 13, 2014, with extraction of associations between the interventions and adherence, as well as other key findings. Each study was assessed for bias using the Cochrane Handbook for Systematic Reviews of Interventions; features of EMP devices and interventions were qualitatively assessed. Thirty-seven studies (32 randomized and 5 nonrandomized) including 4326 patients met inclusion criteria (10 patient interface-only "simple" interventions and 29 "complex" interventions integrated into the health care system [2 qualified for both categories]). Overall, the effect estimates for differences in mean adherence ranged from a decrease of 2.9% to an increase of 34.0%, and the those for differences in the proportion of patients defined as adherent ranged from a decrease of 8.0% to an increase of 49.5%. We identified 5 common EMP characteristics: recorded dosing events and stored records of adherence, audiovisual reminders to cue dosing, digital displays, real-time monitoring, and feedback on adherence performance. Many varieties of EMP devices exist. However, data supporting their use are limited, with variability in the quality of studies testing EMP devices. Devices integrated into the care delivery system and designed to record dosing events are most frequently associated with improved adherence, compared with other devices. Higher-quality evidence is needed to determine the effect, if any, of these low-cost interventions on medication nonadherence and to identify their most useful components.
RiboDB Database: A Comprehensive Resource for Prokaryotic Systematics.
Jauffrit, Frédéric; Penel, Simon; Delmotte, Stéphane; Rey, Carine; de Vienne, Damien M; Gouy, Manolo; Charrier, Jean-Philippe; Flandrois, Jean-Pierre; Brochier-Armanet, Céline
2016-08-01
Ribosomal proteins (r-proteins) are increasingly used as an alternative to ribosomal rRNA for prokaryotic systematics. However, their routine use is difficult because r-proteins are often not or wrongly annotated in complete genome sequences, and there is currently no dedicated exhaustive database of r-proteins. RiboDB aims at fulfilling this gap. This weekly updated comprehensive database allows the fast and easy retrieval of r-protein sequences from publicly available complete prokaryotic genome sequences. The current version of RiboDB contains 90 r-proteins from 3,750 prokaryotic complete genomes encompassing 38 phyla/major classes and 1,759 different species. RiboDB is accessible at http://ribodb.univ-lyon1.fr and through ACNUC interfaces. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Martin, Natasha; McHugh, Hugh; Murtagh, Jono; Oliver-Rose, Connor; Panesar, Div; Pengelly, Harriet; Rieper, Scott; Schofield, Henry; Singh, Vidit; Speed, Amanda; Strachan, Rhona; Tapsell, Te Kahui; Trafford, Sara; van Ryn, Stefan; Ward, Ethan; Whiting, Rosie; Wilson-van Duin, Merryn; Wu, Zhou; Purdie, Gordon; van der Deen, Frederieke S; Thomson, George; Pearson, Amber L; Wilson, Nick
2014-10-17
To collect data on tobacco brand visibility on packaging on outdoor tables at bars/cafes in a downtown area, prior to a proposed plain packaging law. The study was conducted in the Central Business District of Wellington City in March 2014. Observational data were systematically collected on tobacco packaging visibility and smoking by patrons at 55 bars/cafes with outdoor tables. A total of 19,189 patrons, 1707 tobacco packs and 1357 active smokers were observed. One tobacco pack was visible per 11.0 patrons and the active smoking prevalence was 7.1% (95%CI: 4.9-9.2%), similar to Australian results (8.3%). Eighty percent of packs were positioned face-up (showing the brand), 8% face-down (showing the large pictorial warning), and 12% in other positions. Pack visibility per patron was significantly greater in areas without child patrons (RR=3.1, p<0.0001). Both smoking and pack visibility tended to increase from noon into the evenings on weekends. Inter-observer reliability for key measures in this study was high (Bland-Altman plots). Tobacco branding on packaging was frequently visible because of the way smokers position their packs. These results highlight the residual problem posed by this form of marketing. The results also provide baseline data for the future evaluation of plain packaging if a proposed law is implemented in New Zealand. Other results warrant further research, particularly the reasons for lower pack visibility and smoking when children were present.
Cluster structures influenced by interaction with a surface.
Witt, Christopher; Dieterich, Johannes M; Hartke, Bernd
2018-05-30
Clusters on surfaces are vitally important for nanotechnological applications. Clearly, cluster-surface interactions heavily influence the preferred cluster structures, compared to clusters in vacuum. Nevertheless, systematic explorations and an in-depth understanding of these interactions and how they determine the cluster structures are still lacking. Here we present an extension of our well-established non-deterministic global optimization package OGOLEM from isolated clusters to clusters on surfaces. Applying this approach to intentionally simple Lennard-Jones test systems, we produce a first systematic exploration that relates changes in cluster-surface interactions to resulting changes in adsorbed cluster structures.
A survey of packages for large linear systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Milne, Brent
2000-02-11
This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
Modeling loosely annotated images using both given and imagined annotations
NASA Astrophysics Data System (ADS)
Tang, Hong; Boujemaa, Nozha; Chen, Yunhao; Deng, Lei
2011-12-01
In this paper, we present an approach to learn latent semantic analysis models from loosely annotated images for automatic image annotation and indexing. The given annotation in training images is loose due to: 1. ambiguous correspondences between visual features and annotated keywords; 2. incomplete lists of annotated keywords. The second reason motivates us to enrich the incomplete annotation in a simple way before learning a topic model. In particular, some ``imagined'' keywords are poured into the incomplete annotation through measuring similarity between keywords in terms of their co-occurrence. Then, both given and imagined annotations are employed to learn probabilistic topic models for automatically annotating new images. We conduct experiments on two image databases (i.e., Corel and ESP) coupled with their loose annotations, and compare the proposed method with state-of-the-art discrete annotation methods. The proposed method improves word-driven probability latent semantic analysis (PLSA-words) up to a comparable performance with the best discrete annotation method, while a merit of PLSA-words is still kept, i.e., a wider semantic range.
Protein Structure and Function Prediction Using I-TASSER
Yang, Jianyi; Zhang, Yang
2016-01-01
I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386
Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R
2003-01-01
Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545
Biswal, Devendra Kumar; Ghatani, Sudeep; Shylla, Jollin A.; Sahu, Ranjana; Mullapudi, Nandita
2013-01-01
Helminths include both parasitic nematodes (roundworms) and platyhelminths (trematode and cestode flatworms) that are abundant, and are of clinical importance. The genetic characterization of parasitic flatworms using advanced molecular tools is central to the diagnosis and control of infections. Although the nuclear genome houses suitable genetic markers (e.g., in ribosomal (r) DNA) for species identification and molecular characterization, the mitochondrial (mt) genome consistently provides a rich source of novel markers for informative systematics and epidemiological studies. In the last decade, there have been some important advances in mtDNA genomics of helminths, especially lung flukes, liver flukes and intestinal flukes. Fasciolopsis buski, often called the giant intestinal fluke, is one of the largest digenean trematodes infecting humans and found primarily in Asia, in particular the Indian subcontinent. Next-generation sequencing (NGS) technologies now provide opportunities for high throughput sequencing, assembly and annotation within a short span of time. Herein, we describe a high-throughput sequencing and bioinformatics pipeline for mt genomics for F. buski that emphasizes the utility of short read NGS platforms such as Ion Torrent and Illumina in successfully sequencing and assembling the mt genome using innovative approaches for PCR primer design as well as assembly. We took advantage of our NGS whole genome sequence data (unpublished so far) for F. buski and its comparison with available data for the Fasciola hepatica mtDNA as the reference genome for design of precise and specific primers for amplification of mt genome sequences from F. buski. A long-range PCR was carried out to create an NGS library enriched in mt DNA sequences. Two different NGS platforms were employed for complete sequencing, assembly and annotation of the F. buski mt genome. The complete mt genome sequences of the intestinal fluke comprise 14,118 bp and is thus the shortest trematode mitochondrial genome sequenced to date. The noncoding control regions are separated into two parts by the tRNA-Gly gene and don’t contain either tandem repeats or secondary structures, which are typical for trematode control regions. The gene content and arrangement are identical to that of F. hepatica. The F. buski mtDNA genome has a close resemblance with F. hepatica and has a similar gene order tallying with that of other trematodes. The mtDNA for the intestinal fluke is reported herein for the first time by our group that would help investigate Fasciolidae taxonomy and systematics with the aid of mtDNA NGS data. More so, it would serve as a resource for comparative mitochondrial genomics and systematic studies of trematode parasites. PMID:24255820
Annotated chemical patent corpus: a gold standard for text mining.
Akhondi, Saber A; Klenner, Alexander G; Tyrchan, Christian; Manchala, Anil K; Boppana, Kiran; Lowe, Daniel; Zimmermann, Marc; Jagarlapudi, Sarma A R P; Sayle, Roger; Kors, Jan A; Muresan, Sorel
2014-01-01
Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.
Annotated Chemical Patent Corpus: A Gold Standard for Text Mining
Akhondi, Saber A.; Klenner, Alexander G.; Tyrchan, Christian; Manchala, Anil K.; Boppana, Kiran; Lowe, Daniel; Zimmermann, Marc; Jagarlapudi, Sarma A. R. P.; Sayle, Roger; Kors, Jan A.; Muresan, Sorel
2014-01-01
Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org. PMID:25268232
ERIC Educational Resources Information Center
Rubin, Stanford E.; Farley, Roy C.
This guide is the case study manual for the first in a series of instructor-assisted training modules for rehabilitation counselors, supervisors, and graduate students. This typescript manual for the first module focuses on basic intake interviewing skills consisting of: (1) systematic interview programming including attracting, planning and…
Systematic Errors in Peptide and Protein Identification and Quantification by Modified Peptides*
Bogdanow, Boris; Zauber, Henrik; Selbach, Matthias
2016-01-01
The principle of shotgun proteomics is to use peptide mass spectra in order to identify corresponding sequences in a protein database. The quality of peptide and protein identification and quantification critically depends on the sensitivity and specificity of this assignment process. Many peptides in proteomic samples carry biochemical modifications, and a large fraction of unassigned spectra arise from modified peptides. Spectra derived from modified peptides can erroneously be assigned to wrong amino acid sequences. However, the impact of this problem on proteomic data has not yet been investigated systematically. Here we use combinations of different database searches to show that modified peptides can be responsible for 20–50% of false positive identifications in deep proteomic data sets. These false positive hits are particularly problematic as they have significantly higher scores and higher intensities than other false positive matches. Furthermore, these wrong peptide assignments lead to hundreds of false protein identifications and systematic biases in protein quantification. We devise a “cleaned search” strategy to address this problem and show that this considerably improves the sensitivity and specificity of proteomic data. In summary, we show that modified peptides cause systematic errors in peptide and protein identification and quantification and should therefore be considered to further improve the quality of proteomic data annotation. PMID:27215553
Ardin, Maude; Cahais, Vincent; Castells, Xavier; Bouaoun, Liacine; Byrnes, Graham; Herceg, Zdenko; Zavadil, Jiri; Olivier, Magali
2016-04-18
The nature of somatic mutations observed in human tumors at single gene or genome-wide levels can reveal information on past carcinogenic exposures and mutational processes contributing to tumor development. While large amounts of sequencing data are being generated, the associated analysis and interpretation of mutation patterns that may reveal clues about the natural history of cancer present complex and challenging tasks that require advanced bioinformatics skills. To make such analyses accessible to a wider community of researchers with no programming expertise, we have developed within the web-based user-friendly platform Galaxy a first-of-its-kind package called MutSpec. MutSpec includes a set of tools that perform variant annotation and use advanced statistics for the identification of mutation signatures present in cancer genomes and for comparing the obtained signatures with those published in the COSMIC database and other sources. MutSpec offers an accessible framework for building reproducible analysis pipelines, integrating existing methods and scripts developed in-house with publicly available R packages. MutSpec may be used to analyse data from whole-exome, whole-genome or targeted sequencing experiments performed on human or mouse genomes. Results are provided in various formats including rich graphical outputs. An example is presented to illustrate the package functionalities, the straightforward workflow analysis and the richness of the statistics and publication-grade graphics produced by the tool. MutSpec offers an easy-to-use graphical interface embedded in the popular Galaxy platform that can be used by researchers with limited programming or bioinformatics expertise to analyse mutation signatures present in cancer genomes. MutSpec can thus effectively assist in the discovery of complex mutational processes resulting from exogenous and endogenous carcinogenic insults.
The cigarette pack as image: new evidence from tobacco industry documents
Wakefield, M; Morley, C; Horan, J; Cummings, K
2002-01-01
Methods: A search of tobacco company document sites using a list of specified search terms was undertaken during November 2000 to July 2001. Results: Documents show that, especially in the context of tighter restrictions on conventional avenues for tobacco marketing, tobacco companies view cigarette packaging as an integral component of marketing strategy and a vehicle for (a) creating significant in-store presence at the point of purchase, and (b) communicating brand image. Market testing results indicate that such imagery is so strong as to influence smoker's taste ratings of the same cigarettes when packaged differently. Documents also reveal the careful balancing act that companies have employed in using pack design and colour to communicate the impression of lower tar or milder cigarettes, while preserving perceived taste and "satisfaction". Systematic and extensive research is carried out by tobacco companies to ensure that cigarette packaging appeals to selected target groups, including young adults and women. Conclusions: Cigarette pack design is an important communication device for cigarette brands and acts as an advertising medium. Many smokers are misled by pack design into thinking that cigarettes may be "safer". There is a need to consider regulation of cigarette packaging. PMID:11893817
Evaluating Computational Gene Ontology Annotations.
Škunca, Nives; Roberts, Richard J; Steffen, Martin
2017-01-01
Two avenues to understanding gene function are complementary and often overlapping: experimental work and computational prediction. While experimental annotation generally produces high-quality annotations, it is low throughput. Conversely, computational annotations have broad coverage, but the quality of annotations may be variable, and therefore evaluating the quality of computational annotations is a critical concern.In this chapter, we provide an overview of strategies to evaluate the quality of computational annotations. First, we discuss why evaluating quality in this setting is not trivial. We highlight the various issues that threaten to bias the evaluation of computational annotations, most of which stem from the incompleteness of biological databases. Second, we discuss solutions that address these issues, for example, targeted selection of new experimental annotations and leveraging the existing experimental annotations.
Representing annotation compositionality and provenance for the Semantic Web
2013-01-01
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking. PMID:24268021
Huang, Jingshan; Eilbeck, Karen; Smith, Barry; Blake, Judith A; Dou, Dejing; Huang, Weili; Natale, Darren A; Ruttenberg, Alan; Huan, Jun; Zimmermann, Michael T; Jiang, Guoqian; Lin, Yu; Wu, Bin; Strachan, Harrison J; He, Yongqun; Zhang, Shaojie; Wang, Xiaowei; Liu, Zixing; Borchert, Glen M; Tan, Ming
2016-01-01
In recent years, sequencing technologies have enabled the identification of a wide range of non-coding RNAs (ncRNAs). Unfortunately, annotation and integration of ncRNA data has lagged behind their identification. Given the large quantity of information being obtained in this area, there emerges an urgent need to integrate what is being discovered by a broad range of relevant communities. To this end, the Non-Coding RNA Ontology (NCRO) is being developed to provide a systematically structured and precisely defined controlled vocabulary for the domain of ncRNAs, thereby facilitating the discovery, curation, analysis, exchange, and reasoning of data about structures of ncRNAs, their molecular and cellular functions, and their impacts upon phenotypes. The goal of NCRO is to serve as a common resource for annotations of diverse research in a way that will significantly enhance integrative and comparative analysis of the myriad resources currently housed in disparate sources. It is our belief that the NCRO ontology can perform an important role in the comprehensive unification of ncRNA biology and, indeed, fill a critical gap in both the Open Biological and Biomedical Ontologies (OBO) Library and the National Center for Biomedical Ontology (NCBO) BioPortal. Our initial focus is on the ontological representation of small regulatory ncRNAs, which we see as the first step in providing a resource for the annotation of data about all forms of ncRNAs. The NCRO ontology is free and open to all users, accessible at: http://purl.obolibrary.org/obo/ncro.owl.
Tong, J H S; Hawi, Z; Dark, C; Cummins, T D R; Johnson, B P; Newman, D P; Lau, R; Vance, A; Heussler, H S; Matthews, N; Bellgrove, M A; Pang, K C
2016-11-01
Attention deficit hyperactivity disorder (ADHD) is a highly heritable psychiatric condition with negative lifetime outcomes. Uncovering its genetic architecture should yield important insights into the neurobiology of ADHD and assist development of novel treatment strategies. Twenty years of candidate gene investigations and more recently genome-wide association studies have identified an array of potential association signals. In this context, separating the likely true from false associations ('the wheat' from 'the chaff') will be crucial for uncovering the functional biology of ADHD. Here, we defined a set of 2070 DNA variants that showed evidence of association with ADHD (or were in linkage disequilibrium). More than 97% of these variants were noncoding, and were prioritised for further exploration using two tools-genome-wide annotation of variants (GWAVA) and Combined Annotation-Dependent Depletion (CADD)-that were recently developed to rank variants based upon their likely pathogenicity. Capitalising on recent efforts such as the Encyclopaedia of DNA Elements and US National Institutes of Health Roadmap Epigenomics Projects to improve understanding of the noncoding genome, we subsequently identified 65 variants to which we assigned functional annotations, based upon their likely impact on alternative splicing, transcription factor binding and translational regulation. We propose that these 65 variants, which possess not only a high likelihood of pathogenicity but also readily testable functional hypotheses, represent a tractable shortlist for future experimental validation in ADHD. Taken together, this study brings into sharp focus the likely relevance of noncoding variants for the genetic risk associated with ADHD, and more broadly suggests a bioinformatics approach that should be relevant to other psychiatric disorders.
Enabling a Community to Dissect an Organism: Overview of the Neurospora Functional Genomics Project
Dunlap, Jay C.; Borkovich, Katherine A.; Henn, Matthew R.; Turner, Gloria E.; Sachs, Matthew S.; Glass, N. Louise; McCluskey, Kevin; Plamann, Michael; Galagan, James E.; Birren, Bruce W.; Weiss, Richard L.; Townsend, Jeffrey P.; Loros, Jennifer J.; Nelson, Mary Anne; Lambreghts, Randy; Colot, Hildur V.; Park, Gyungsoon; Collopy, Patrick; Ringelberg, Carol; Crew, Christopher; Litvinkova, Liubov; DeCaprio, Dave; Hood, Heather M.; Curilla, Susan; Shi, Mi; Crawford, Matthew; Koerhsen, Michael; Montgomery, Phil; Larson, Lisa; Pearson, Matthew; Kasuga, Takao; Tian, Chaoguang; Baştürkmen, Meray; Altamirano, Lorena; Xu, Junhuan
2013-01-01
A consortium of investigators is engaged in a functional genomics project centered on the filamentous fungus Neurospora, with an eye to opening up the functional genomic analysis of all the filamentous fungi. The overall goal of the four interdependent projects in this effort is to acccomplish functional genomics, annotation, and expression analyses of Neurospora crassa, a filamentous fungus that is an established model for the assemblage of over 250,000 species of nonyeast fungi. Building from the completely sequenced 43-Mb Neurospora genome, Project 1 is pursuing the systematic disruption of genes through targeted gene replacements, phenotypic analysis of mutant strains, and their distribution to the scientific community at large. Project 2, through a primary focus in Annotation and Bioinformatics, has developed a platform for electronically capturing community feedback and data about the existing annotation, while building and maintaining a database to capture and display information about phenotypes. Oligonucleotide-based microarrays created in Project 3 are being used to collect baseline expression data for the nearly 11,000 distinguishable transcripts in Neurospora under various conditions of growth and development, and eventually to begin to analyze the global effects of loss of novel genes in strains created by Project 1. cDNA libraries generated in Project 4 document the overall complexity of expressed sequences in Neurospora, including alternative splicing alternative promoters and antisense transcripts. In addition, these studies have driven the assembly of an SNP map presently populated by nearly 300 markers that will greatly accelerate the positional cloning of genes. PMID:17352902
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC
Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-01-01
Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699
Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome
2010-01-01
Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan
2017-05-01
To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
ERIC Educational Resources Information Center
Chen, I-Jung; Yen, Jung-Chuan
2013-01-01
This study extends current knowledge by exploring the effect of different annotation formats, namely in-text annotation, glossary annotation, and pop-up annotation, on hypertext reading comprehension in a foreign language and vocabulary acquisition across student proficiencies. User attitudes toward the annotation presentation were also…
Krojer, Tobias; Talon, Romain; Pearce, Nicholas; Collins, Patrick; Douangamath, Alice; Brandao-Neto, Jose; Dias, Alexandre; Marsden, Brian; von Delft, Frank
2017-03-01
XChemExplorer (XCE) is a data-management and workflow tool to support large-scale simultaneous analysis of protein-ligand complexes during structure-based ligand discovery (SBLD). The user interfaces of established crystallographic software packages such as CCP4 [Winn et al. (2011), Acta Cryst. D67, 235-242] or PHENIX [Adams et al. (2010), Acta Cryst. D66, 213-221] have entrenched the paradigm that a `project' is concerned with solving one structure. This does not hold for SBLD, where many almost identical structures need to be solved and analysed quickly in one batch of work. Functionality to track progress and annotate structures is essential. XCE provides an intuitive graphical user interface which guides the user from data processing, initial map calculation, ligand identification and refinement up until data dissemination. It provides multiple entry points depending on the need of each project, enables batch processing of multiple data sets and records metadata, progress and annotations in an SQLite database. XCE is freely available and works on any Linux and Mac OS X system, and the only dependency is to have the latest version of CCP4 installed. The design and usage of this tool are described here, and its usefulness is demonstrated in the context of fragment-screening campaigns at the Diamond Light Source. It is routinely used to analyse projects comprising 1000 data sets or more, and therefore scales well to even very large ligand-design projects.
Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.
Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris
2004-07-14
With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao
2018-01-01
Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
Krojer, Tobias; Talon, Romain; Pearce, Nicholas; Douangamath, Alice; Brandao-Neto, Jose; Dias, Alexandre; Marsden, Brian
2017-01-01
XChemExplorer (XCE) is a data-management and workflow tool to support large-scale simultaneous analysis of protein–ligand complexes during structure-based ligand discovery (SBLD). The user interfaces of established crystallographic software packages such as CCP4 [Winn et al. (2011 ▸), Acta Cryst. D67, 235–242] or PHENIX [Adams et al. (2010 ▸), Acta Cryst. D66, 213–221] have entrenched the paradigm that a ‘project’ is concerned with solving one structure. This does not hold for SBLD, where many almost identical structures need to be solved and analysed quickly in one batch of work. Functionality to track progress and annotate structures is essential. XCE provides an intuitive graphical user interface which guides the user from data processing, initial map calculation, ligand identification and refinement up until data dissemination. It provides multiple entry points depending on the need of each project, enables batch processing of multiple data sets and records metadata, progress and annotations in an SQLite database. XCE is freely available and works on any Linux and Mac OS X system, and the only dependency is to have the latest version of CCP4 installed. The design and usage of this tool are described here, and its usefulness is demonstrated in the context of fragment-screening campaigns at the Diamond Light Source. It is routinely used to analyse projects comprising 1000 data sets or more, and therefore scales well to even very large ligand-design projects. PMID:28291762
MAGIC database and interfaces: an integrated package for gene discovery and expression.
Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H
2004-01-01
The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
Bell, Michael J; Gillespie, Colin S; Swan, Daniel; Lord, Phillip
2012-09-15
Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. phillip.lord@newcastle.ac.uk.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.
Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich
2015-09-01
To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Chen, Wen; Zhang, Xuan; Li, Jing; Huang, Shulan; Xiang, Shuanglin; Hu, Xiang; Liu, Changning
2018-05-09
Zebrafish is a full-developed model system for studying development processes and human disease. Recent studies of deep sequencing had discovered a large number of long non-coding RNAs (lncRNAs) in zebrafish. However, only few of them had been functionally characterized. Therefore, how to take advantage of the mature zebrafish system to deeply investigate the lncRNAs' function and conservation is really intriguing. We systematically collected and analyzed a series of zebrafish RNA-seq data, then combined them with resources from known database and literatures. As a result, we obtained by far the most complete dataset of zebrafish lncRNAs, containing 13,604 lncRNA genes (21,128 transcripts) in total. Based on that, a co-expression network upon zebrafish coding and lncRNA genes was constructed and analyzed, and used to predict the Gene Ontology (GO) and the KEGG annotation of lncRNA. Meanwhile, we made a conservation analysis on zebrafish lncRNA, identifying 1828 conserved zebrafish lncRNA genes (1890 transcripts) that have their putative mammalian orthologs. We also found that zebrafish lncRNAs play important roles in regulation of the development and function of nervous system; these conserved lncRNAs present a significant sequential and functional conservation, with their mammalian counterparts. By integrative data analysis and construction of coding-lncRNA gene co-expression network, we gained the most comprehensive dataset of zebrafish lncRNAs up to present, as well as their systematic annotations and comprehensive analyses on function and conservation. Our study provides a reliable zebrafish-based platform to deeply explore lncRNA function and mechanism, as well as the lncRNA commonality between zebrafish and human.
TE-Tracker: systematic identification of transposition events through whole-genome resequencing.
Gilly, Arthur; Etcheverry, Mathilde; Madoui, Mohammed-Amin; Guy, Julie; Quadrana, Leandro; Alberti, Adriana; Martin, Antoine; Heitkam, Tony; Engelen, Stefan; Labadie, Karine; Le Pen, Jeremie; Wincker, Patrick; Colot, Vincent; Aury, Jean-Marc
2014-11-19
Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
Qcorp: an annotated classification corpus of Chinese health questions.
Guo, Haihong; Na, Xu; Li, Jiao
2018-03-22
Health question-answering (QA) systems have become a typical application scenario of Artificial Intelligent (AI). An annotated question corpus is prerequisite for training machines to understand health information needs of users. Thus, we aimed to develop an annotated classification corpus of Chinese health questions (Qcorp) and make it openly accessible. We developed a two-layered classification schema and corresponding annotation rules on basis of our previous work. Using the schema, we annotated 5000 questions that were randomly selected from 5 Chinese health websites within 6 broad sections. 8 annotators participated in the annotation task, and the inter-annotator agreement was evaluated to ensure the corpus quality. Furthermore, the distribution and relationship of the annotated tags were measured by descriptive statistics and social network map. The questions were annotated using 7101 tags that covers 29 topic categories in the two-layered schema. In our released corpus, the distribution of questions on the top-layered categories was treatment of 64.22%, diagnosis of 37.14%, epidemiology of 14.96%, healthy lifestyle of 10.38%, and health provider choice of 4.54% respectively. Both the annotated health questions and annotation schema were openly accessible on the Qcorp website. Users can download the annotated Chinese questions in CSV, XML, and HTML format. We developed a Chinese health question corpus including 5000 manually annotated questions. It is openly accessible and would contribute to the intelligent health QA system development.
Problems experienced by older people when opening medicine packaging.
Philbert, Daphne; Notenboom, Kim; Bouvy, Marcel L; van Geffen, Erica C G
2014-06-01
Medicine packages can cause problems in daily practice, especially among older people. This study aimed to investigate the prevalence of problems experienced by older people when opening medicine packaging and to investigate how patients manage these problems. A convenience sample of 30 community pharmacies participated in this study. They selected a systematic sample of 30 patients over 65 years old with a recent omeprazole prescription, and a questionnaire was administered by telephone for at least 10 patients per pharmacy. A total of 317 patients completed the questionnaire. They received their omeprazole in a bottle (n = 179, 56.5%), push-through blister pack (n = 102, 32.2%) or peel-off blister pack (n = 36, 11.4%). Some 28.4% of all patients experienced one or more problems with opening their omeprazole packaging; most problems occurred with peel-off blisters (n = 24, 66.7% of all respondents using peel-off blisters), followed by push-through blisters (n = 34, 33.3%) and finally bottles (n = 32, 17.9%). The risk of experiencing problems with peel-off blisters and push-through blisters was higher [relative risk 3.7 (95% confidence interval 2.5-5.5) and 1.9 (1.2-2.8), respectively] than the risk of experiencing problems with opening bottles. Two-thirds of respondents reported management strategies for their problems. Most were found for problems opening bottles (n = 24, 75%), followed by push-through blisters (n = 24, 70.6%) and peel-off blisters (n = 14, 58.3%). One in four patients over 65 experienced difficulties opening their omeprazole packaging and not all of them reported a management strategy for their problems. Manufacturers are advised to pay more attention to the user-friendliness of product packaging. In addition, it is important that pharmacy staff clearly instruct patients on how to open their medicine packaging, or assist them in choosing the most appropriate packaging. © 2013 Royal Pharmaceutical Society.
NoGOA: predicting noisy GO annotations using evidences and sparse representation.
Yu, Guoxian; Lu, Chang; Wang, Jun
2017-07-21
Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Zhao, Jian; Glueck, Michael; Breslav, Simon; Chevalier, Fanny; Khan, Azam
2017-01-01
User-authored annotations of data can support analysts in the activity of hypothesis generation and sensemaking, where it is not only critical to document key observations, but also to communicate insights between analysts. We present annotation graphs, a dynamic graph visualization that enables meta-analysis of data based on user-authored annotations. The annotation graph topology encodes annotation semantics, which describe the content of and relations between data selections, comments, and tags. We present a mixed-initiative approach to graph layout that integrates an analyst's manual manipulations with an automatic method based on similarity inferred from the annotation semantics. Various visual graph layout styles reveal different perspectives on the annotation semantics. Annotation graphs are implemented within C8, a system that supports authoring annotations during exploratory analysis of a dataset. We apply principles of Exploratory Sequential Data Analysis (ESDA) in designing C8, and further link these to an existing task typology in the visualization literature. We develop and evaluate the system through an iterative user-centered design process with three experts, situated in the domain of analyzing HCI experiment data. The results suggest that annotation graphs are effective as a method of visually extending user-authored annotations to data meta-analysis for discovery and organization of ideas.
Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E
2015-09-29
In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Climate Data Assimilation on a Massively Parallel Supercomputer
NASA Technical Reports Server (NTRS)
Ding, Hong Q.; Ferraro, Robert D.
1996-01-01
We have designed and implemented a set of highly efficient and highly scalable algorithms for an unstructured computational package, the PSAS data assimilation package, as demonstrated by detailed performance analysis of systematic runs on up to 512-nodes of an Intel Paragon. The preconditioned Conjugate Gradient solver achieves a sustained 18 Gflops performance. Consequently, we achieve an unprecedented 100-fold reduction in time to solution on the Intel Paragon over a single head of a Cray C90. This not only exceeds the daily performance requirement of the Data Assimilation Office at NASA's Goddard Space Flight Center, but also makes it possible to explore much larger and challenging data assimilation problems which are unthinkable on a traditional computer platform such as the Cray C90.
-redshifted), Observed Flux, Statistical Error (Based on the optimal extraction algorithm of the IRAF packages were acquired using different instrumental settings for the blue and red parts of the spectrum to avoid extracted for systematics checks of the wavelength calibration. Wavelength and flux calibration were applied
ERIC Educational Resources Information Center
Gül, Seray Olçay
2016-01-01
There are many studies in the literature in which individuals with intellectual disabilities exhibit social skills deficits and which show the need for teaching these skills systematically. This study aims to investigate the effects of an intervention package of consisting computer-presented video modeling and Social Stories on individuals with…
Eye Tracking Outcomes in Tobacco Control Regulation and Communication: A Systematic Review.
Meernik, Clare; Jarman, Kristen; Wright, Sarah Towner; Klein, Elizabeth G; Goldstein, Adam O; Ranney, Leah
2016-10-01
In this paper we synthesize the evidence from eye tracking research in tobacco control to inform tobacco regulatory strategies and tobacco communication campaigns. We systematically searched 11 databases for studies that reported eye tracking outcomes in regards to tobacco regulation and communication. Two coders independently reviewed studies for inclusion and abstracted study characteristics and findings. Eighteen studies met full criteria for inclusion. Eye tracking studies on health warnings consistently showed these warnings often were ignored, though eye tracking demonstrated that novel warnings, graphic warnings, and plain packaging can increase attention toward warnings. Eye tracking also revealed that greater visual attention to warnings on advertisements and packages consistently was associated with cognitive processing as measured by warning recall. Eye tracking is a valid indicator of attention, cognitive processing, and memory. The use of this technology in tobacco control research complements existing methods in tobacco regulatory and communication science; it also can be used to examine the effects of health warnings and other tobacco product communications on consumer behavior in experimental settings prior to the implementation of novel health communication policies. However, the utility of eye tracking will be enhanced by the standardization of methodology and reporting metrics.
Eye Tracking Outcomes in Tobacco Control Regulation and Communication: A Systematic Review
Meernik, Clare; Jarman, Kristen; Wright, Sarah Towner; Klein, Elizabeth G.; Goldstein, Adam O.; Ranney, Leah
2016-01-01
Objective In this paper we synthesize the evidence from eye tracking research in tobacco control to inform tobacco regulatory strategies and tobacco communication campaigns. Methods We systematically searched 11 databases for studies that reported eye tracking outcomes in regards to tobacco regulation and communication. Two coders independently reviewed studies for inclusion and abstracted study characteristics and findings. Results Eighteen studies met full criteria for inclusion. Eye tracking studies on health warnings consistently showed these warnings often were ignored, though eye tracking demonstrated that novel warnings, graphic warnings, and plain packaging can increase attention toward warnings. Eye tracking also revealed that greater visual attention to warnings on advertisements and packages consistently was associated with cognitive processing as measured by warning recall. Conclusions Eye tracking is a valid indicator of attention, cognitive processing, and memory. The use of this technology in tobacco control research complements existing methods in tobacco regulatory and communication science; it also can be used to examine the effects of health warnings and other tobacco product communications on consumer behavior in experimental settings prior to the implementation of novel health communication policies. However, the utility of eye tracking will be enhanced by the standardization of methodology and reporting metrics. PMID:27668270
Annotation of UAV surveillance video
NASA Astrophysics Data System (ADS)
Howlett, Todd; Robertson, Mark A.; Manthey, Dan; Krol, John
2004-08-01
Significant progress toward the development of a video annotation capability is presented in this paper. Research and development of an object tracking algorithm applicable for UAV video is described. Object tracking is necessary for attaching the annotations to the objects of interest. A methodology and format is defined for encoding video annotations using the SMPTE Key-Length-Value encoding standard. This provides the following benefits: a non-destructive annotation, compliance with existing standards, video playback in systems that are not annotation enabled and support for a real-time implementation. A model real-time video annotation system is also presented, at a high level, using the MPEG-2 Transport Stream as the transmission medium. This work was accomplished to meet the Department of Defense"s (DoD"s) need for a video annotation capability. Current practices for creating annotated products are to capture a still image frame, annotate it using an Electric Light Table application, and then pass the annotated image on as a product. That is not adequate for reporting or downstream cueing. It is too slow and there is a severe loss of information. This paper describes a capability for annotating directly on the video.
The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A.; Kim, Sungjoon; Wilson, Christopher J.; Lehár, Joseph; Kryukov, Gregory V.; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F.; Monahan, John E.; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A.; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H.; Cheng, Jill; Yu, Guoying K.; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D.; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C.; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P.; Gabriel, Stacey B.; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E.; Weber, Barbara L.; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L.; Meyerson, Matthew; Golub, Todd R.; Morrissey, Michael P.; Sellers, William R.; Schlegel, Robert; Garraway, Levi A.
2012-01-01
The systematic translation of cancer genomic data into knowledge of tumor biology and therapeutic avenues remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacologic annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacologic profiles for 24 anticancer drugs across 479 of the lines, this collection allowed identification of genetic, lineage, and gene expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Altogether, our results suggest that large, annotated cell line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of “personalized” therapeutic regimens2. PMID:22460905
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A; Kim, Sungjoon; Wilson, Christopher J; Lehár, Joseph; Kryukov, Gregory V; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F; Monahan, John E; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H; Cheng, Jill; Yu, Guoying K; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P; Gabriel, Stacey B; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E; Weber, Barbara L; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L; Meyerson, Matthew; Golub, Todd R; Morrissey, Michael P; Sellers, William R; Schlegel, Robert; Garraway, Levi A
2012-03-28
The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.
Lim, Regine M; Silver, Ari J; Silver, Maxwell J; Borroto, Carlos; Spurrier, Brett; Petrossian, Tanya C; Larson, Jessica L; Silver, Lee M
2016-02-01
Carrier screening for mutations contributing to cystic fibrosis (CF) is typically accomplished with panels composed of variants that are clinically validated primarily in patients of European descent. This approach has created a static genetic and phenotypic profile for CF. An opportunity now exists to reevaluate the disease profile of CFTR at a global population level. CFTR allele and genotype frequencies were obtained from a nonpatient cohort with more than 60,000 unrelated personal genomes collected by the Exome Aggregation Consortium. Likely disease-contributing mutations were identified with the use of public database annotations and computational tools. We identified 131 previously described and likely pathogenic variants and another 210 untested variants with a high probability of causing protein damage. None of the current genetic screening panels or existing CFTR mutation databases covered a majority of deleterious variants in any geographical population outside of Europe. Both clinical annotation and mutation coverage by commercially available targeted screening panels for CF are strongly biased toward detection of reproductive risk in persons of European descent. South and East Asian populations are severely underrepresented, in part because of a definition of disease that preferences the phenotype associated with European-typical CFTR alleles.
larvalign: Aligning Gene Expression Patterns from the Larval Brain of Drosophila melanogaster.
Muenzing, Sascha E A; Strauch, Martin; Truman, James W; Bühler, Katja; Thum, Andreas S; Merhof, Dorit
2018-01-01
The larval brain of the fruit fly Drosophila melanogaster is a small, tractable model system for neuroscience. Genes for fluorescent marker proteins can be expressed in defined, spatially restricted neuron populations. Here, we introduce the methods for 1) generating a standard template of the larval central nervous system (CNS), 2) spatial mapping of expression patterns from different larvae into a reference space defined by the standard template. We provide a manually annotated gold standard that serves for evaluation of the registration framework involved in template generation and mapping. A method for registration quality assessment enables the automatic detection of registration errors, and a semi-automatic registration method allows one to correct registrations, which is a prerequisite for a high-quality, curated database of expression patterns. All computational methods are available within the larvalign software package: https://github.com/larvalign/larvalign/releases/tag/v1.0.
Supporting metabolomics with adaptable software: design architectures for the end-user.
Sarpe, Vladimir; Schriemer, David C
2017-02-01
Large and disparate sets of LC-MS data are generated by modern metabolomics profiling initiatives, and while useful software tools are available to annotate and quantify compounds, the field requires continued software development in order to sustain methodological innovation. Advances in software development practices allow for a new paradigm in tool development for metabolomics, where increasingly the end-user can develop or redeploy utilities ranging from simple algorithms to complex workflows. Resources that provide an organized framework for development are described and illustrated with LC-MS processing packages that have leveraged their design tools. Full access to these resources depends in part on coding experience, but the emergence of workflow builders and pluggable frameworks strongly reduces the skill level required. Developers in the metabolomics community are encouraged to use these resources and design content for uptake and reuse. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mapping RNA-seq Reads with STAR
Dobin, Alexander; Gingeras, Thomas R.
2015-01-01
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920
Mapping RNA-seq Reads with STAR.
Dobin, Alexander; Gingeras, Thomas R
2015-09-03
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.
SWIFT-Review: a text-mining workbench for systematic review.
Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina
2016-05-23
There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus. Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics. Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening effort ordinarily required when using un-ordered document lists. In addition, the tagging and annotation capabilities of SWIFT-Review can be useful during the activities of scoping and problem formulation. Text-mining and machine learning software such as SWIFT-Review can be valuable tools to reduce the human screening burden and assist in problem formulation.
Dahdul, Wasila M; Balhoff, James P; Engeman, Jeffrey; Grande, Terry; Hilton, Eric J; Kothari, Cartik; Lapp, Hilmar; Lundberg, John G; Midford, Peter E; Vision, Todd J; Westerfield, Monte; Mabee, Paula M
2010-05-20
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
Zhang, Jianwei; Kudrna, Dave; Mu, Ting; Li, Weiming; Copetti, Dario; Yu, Yeisoo; Goicoechea, Jose Luis; Lei, Yang; Wing, Rod A
2016-10-15
Next generation sequencing technologies have revolutionized our ability to rapidly and affordably generate vast quantities of sequence data. Once generated, raw sequences are assembled into contigs or scaffolds. However, these assemblies are mostly fragmented and inaccurate at the whole genome scale, largely due to the inability to integrate additional informative datasets (e.g. physical, optical and genetic maps). To address this problem, we developed a semi-automated software tool-Genome Puzzle Master (GPM)-that enables the integration of additional genomic signposts to edit and build 'new-gen-assemblies' that result in high-quality 'annotation-ready' pseudomolecules. With GPM, loaded datasets can be connected to each other via their logical relationships which accomplishes tasks to 'group,' 'merge,' 'order and orient' sequences in a draft assembly. Manual editing can also be performed with a user-friendly graphical interface. Final pseudomolecules reflect a user's total data package and are available for long-term project management. GPM is a web-based pipeline and an important part of a Laboratory Information Management System (LIMS) which can be easily deployed on local servers for any genome research laboratory. The GPM (with LIMS) package is available at https://github.com/Jianwei-Zhang/LIMS CONTACTS: jzhang@mail.hzau.edu.cn or rwing@mail.arizona.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Greenwald, William W; Li, He; Smith, Erin N; Benaglio, Paola; Nariai, Naoki; Frazer, Kelly A
2017-04-07
Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC
Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-01-01
Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598
BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.
Satija, Rahul; Novák, Adám; Miklós, István; Lyngsø, Rune; Hein, Jotun
2009-08-28
We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the alpha-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/~satija/BigFoot/
Lassi, Zohra S; Salam, Rehana A; Das, Jai K; Bhutta, Zulfiqar A
2014-01-01
This paper describes the conceptual framework and the methodology used to guide the systematic reviews of community-based interventions (CBIs) for the prevention and control of infectious diseases of poverty (IDoP). We adapted the conceptual framework from the 3ie work on the 'Community-Based Intervention Packages for Preventing Maternal Morbidity and Mortality and Improving Neonatal Outcomes' to aid in the analyzing of the existing CBIs for IDoP. The conceptual framework revolves around objectives, inputs, processes, outputs, outcomes, and impacts showing the theoretical linkages between the delivery of the interventions targeting these diseases through various community delivery platforms and the consequent health impacts. We also describe the methodology undertaken to conduct the systematic reviews and the meta-analyses.
Packaging Technology Developed for High-Temperature Silicon Carbide Microsystems
NASA Technical Reports Server (NTRS)
Chen, Liang-Yu; Hunter, Gary W.; Neudeck, Philip G.
2001-01-01
High-temperature electronics and sensors are necessary for harsh-environment space and aeronautical applications, such as sensors and electronics for space missions to the inner solar system, sensors for in situ combustion and emission monitoring, and electronics for combustion control for aeronautical and automotive engines. However, these devices cannot be used until they can be packaged in appropriate forms for specific applications. Suitable packaging technology for operation temperatures up to 500 C and beyond is not commercially available. Thus, the development of a systematic high-temperature packaging technology for SiC-based microsystems is essential for both in situ testing and commercializing high-temperature SiC sensors and electronics. In response to these needs, researchers at Glenn innovatively designed, fabricated, and assembled a new prototype electronic package for high-temperature electronic microsystems using ceramic substrates (aluminum nitride and aluminum oxide) and gold (Au) thick-film metallization. Packaging components include a ceramic packaging frame, thick-film metallization-based interconnection system, and a low electrical resistance SiC die-attachment scheme. Both the materials and fabrication process of the basic packaging components have been tested with an in-house-fabricated SiC semiconductor test chip in an oxidizing environment at temperatures from room temperature to 500 C for more than 1000 hr. These test results set lifetime records for both high-temperature electronic packaging and high-temperature electronic device testing. As required, the thick-film-based interconnection system demonstrated low (2.5 times of the room-temperature resistance of the Au conductor) and stable (decreased 3 percent in 1500 hr of continuous testing) electrical resistance at 500 C in an oxidizing environment. Also as required, the electrical isolation impedance between printed wires that were not electrically joined by a wire bond remained high (greater than 0.4 GW) at 500 C in air. The attached SiC diode demonstrated low (less than 3.8 W/mm2) and relatively consistent dynamic resistance from room temperature to 500 C. These results indicate that the prototype package and the compatible die-attach scheme meet the initial design standards for high-temperature, low-power, and long-term operation. This technology will be further developed and evaluated, especially with more mechanical tests of each packaging element for operation at higher temperatures and longer lifetimes.
Gene Ontology annotations at SGD: new data sources and annotation methods
Hong, Eurie L.; Balakrishnan, Rama; Dong, Qing; Christie, Karen R.; Park, Julie; Binkley, Gail; Costanzo, Maria C.; Dwight, Selina S.; Engel, Stacia R.; Fisk, Dianna G.; Hirschman, Jodi E.; Hitz, Benjamin C.; Krieger, Cynthia J.; Livstone, Michael S.; Miyasato, Stuart R.; Nash, Robert S.; Oughtred, Rose; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Zhu, Kathy K.; Dolinski, Kara; Botstein, David; Cherry, J. Michael
2008-01-01
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current. PMID:17982175
Brettin, Thomas; Davis, James J.; Disz, Terry; ...
2015-02-10
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offersmore » a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.« less
HoPaCI-DB: host-Pseudomonas and Coxiella interaction database
Bleves, Sophie; Dunger, Irmtraud; Walter, Mathias C.; Frangoulidis, Dimitrios; Kastenmüller, Gabi; Voulhoux, Romé; Ruepp, Andreas
2014-01-01
Bacterial infectious diseases are the result of multifactorial processes affected by the interplay between virulence factors and host targets. The host-Pseudomonas and Coxiella interaction database (HoPaCI-DB) is a publicly available manually curated integrative database (http://mips.helmholtz-muenchen.de/HoPaCI/) of host–pathogen interaction data from Pseudomonas aeruginosa and Coxiella burnetii. The resource provides structured information on 3585 experimentally validated interactions between molecules, bioprocesses and cellular structures extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make HoPaCI-DB a versatile knowledge base for biologists and network biology approaches. PMID:24137008
Protein function prediction--the power of multiplicity.
Rentzsch, Robert; Orengo, Christine A
2009-04-01
Advances in experimental and computational methods have quietly ushered in a new era in protein function annotation. This 'age of multiplicity' is marked by the notion that only the use of multiple tools, multiple evidence and considering the multiple aspects of function can give us the broad picture that 21st century biology will need to link and alter micro- and macroscopic phenotypes. It might also help us to undo past mistakes by removing errors from our databases and prevent us from producing more. On the downside, multiplicity is often confusing. We therefore systematically review methods and resources for automated protein function prediction, looking at individual (biochemical) and contextual (network) functions, respectively.
Ribosome profiling reveals the what, when, where and how of protein synthesis.
Brar, Gloria A; Weissman, Jonathan S
2015-11-01
Ribosome profiling, which involves the deep sequencing of ribosome-protected mRNA fragments, is a powerful tool for globally monitoring translation in vivo. The method has facilitated discovery of the regulation of gene expression underlying diverse and complex biological processes, of important aspects of the mechanism of protein synthesis, and even of new proteins, by providing a systematic approach for experimental annotation of coding regions. Here, we introduce the methodology of ribosome profiling and discuss examples in which this approach has been a key factor in guiding biological discovery, including its prominent role in identifying thousands of novel translated short open reading frames and alternative translation products.
NASA Technical Reports Server (NTRS)
Filman, Robert E.
2003-01-01
This viewgraph presentation provides information on Object Infrastructure Framework (OIF), an Aspect-Oriented Programming (AOP) system. The presentation begins with an introduction to the difficulties and requirements of distributed computing, including functional and non-functional requirements (ilities). The architecture of Distributed Object Technology includes stubs, proxies for implementation objects, and skeletons, proxies for client applications. The key OIF ideas (injecting behavior, annotated communications, thread contexts, and pragma) are discussed. OIF is an AOP mechanism; AOP is centered on: 1) Separate expression of crosscutting concerns; 2) Mechanisms to weave the separate expressions into a unified system. AOP is software engineering technology for separately expressing systematic properties while nevertheless producing running systems that embody these properties.
Lee, K-E; Lee, E-J; Park, H-S
2016-08-30
Recent advances in computational epigenetics have provided new opportunities to evaluate n-gram probabilistic language models. In this paper, we describe a systematic genome-wide approach for predicting functional roles in inactive chromatin regions by using a sequence-based Markovian chromatin map of the human genome. We demonstrate that Markov chains of sequences can be used as a precursor to predict functional roles in heterochromatin regions and provide an example comparing two publicly available chromatin annotations of large-scale epigenomics projects: ENCODE project consortium and Roadmap Epigenomics consortium.
An Annotated and Federated Digital Library of Marine Animal Sounds
2005-01-01
of the annotations and the relevant segment delimitation points and linkages to other relevant metadata fields; e) search engines that support the...annotators to add information to the same recording, and search engines that permit either all-annotator or specific-annotator searches. To our knowledge
Cerveau, Nicolas; Jackson, Daniel J
2016-12-09
Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.
Ashrafi, Hamid; Hill, Theresa; Stoffel, Kevin; Kozik, Alexander; Yao, Jiqiang; Chin-Wo, Sebastian Reyes; Van Deynze, Allen
2012-10-30
Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.
The cigarette pack as image: new evidence from tobacco industry documents.
Wakefield, M; Morley, C; Horan, J K; Cummings, K M
2002-03-01
To gain an understanding of the role of pack design in tobacco marketing. A search of tobacco company document sites using a list of specified search terms was undertaken during November 2000 to July 2001. Documents show that, especially in the context of tighter restrictions on conventional avenues for tobacco marketing, tobacco companies view cigarette packaging as an integral component of marketing strategy and a vehicle for (a) creating significant in-store presence at the point of purchase, and (b) communicating brand image. Market testing results indicate that such imagery is so strong as to influence smoker's taste ratings of the same cigarettes when packaged differently. Documents also reveal the careful balancing act that companies have employed in using pack design and colour to communicate the impression of lower tar or milder cigarettes, while preserving perceived taste and "satisfaction". Systematic and extensive research is carried out by tobacco companies to ensure that cigarette packaging appeals to selected target groups, including young adults and women. Cigarette pack design is an important communication device for cigarette brands and acts as an advertising medium. Many smokers are misled by pack design into thinking that cigarettes may be "safer". There is a need to consider regulation of cigarette packaging.
Stewart, H.; Bingham, R.J.; White, S. J.; Dykeman, E. C.; Zothner, C.; Tuplin, A. K.; Stockley, P. G.; Twarock, R.; Harris, M.
2016-01-01
The specific packaging of the hepatitis C virus (HCV) genome is hypothesised to be driven by Core-RNA interactions. To identify the regions of the viral genome involved in this process, we used SELEX (systematic evolution of ligands by exponential enrichment) to identify RNA aptamers which bind specifically to Core in vitro. Comparison of these aptamers to multiple HCV genomes revealed the presence of a conserved terminal loop motif within short RNA stem-loop structures. We postulated that interactions of these motifs, as well as sub-motifs which were present in HCV genomes at statistically significant levels, with the Core protein may drive virion assembly. We mutated 8 of these predicted motifs within the HCV infectious molecular clone JFH-1, thereby producing a range of mutant viruses predicted to possess altered RNA secondary structures. RNA replication and viral titre were unaltered in viruses possessing only one mutated structure. However, infectivity titres were decreased in viruses possessing a higher number of mutated regions. This work thus identified multiple novel RNA motifs which appear to contribute to genome packaging. We suggest that these structures act as cooperative packaging signals to drive specific RNA encapsidation during HCV assembly. PMID:26972799
Probst, Yasmine C; Dengate, Alexis; Jacobs, Jenny; Louie, Jimmy Cy; Dunford, Elizabeth K
2017-12-01
Limiting the intake of added sugars in the diet remains a key focus of global dietary recommendations. To date there has been no systematic monitoring of the major types of added sugars used in the Australian food supply. The present study aimed to identify the most common added sugars and non-nutritive sweeteners in the Australian packaged food supply. Secondary analysis of data from the Australian FoodSwitch database was undertaken. Forty-six added sugars and eight non-nutritive sweetener types were extracted from the ingredient lists of 5744 foods across seventeen food categories. Australia. Not applicable. Added sugar ingredients were found in 61 % of the sample of foods examined and non-nutritive sweetener ingredients were found in 69 %. Only 31 % of foods contained no added sugar or non-nutritive sweetener. Sugar (as an ingredient), glucose syrup, maple syrup, maltodextrin and glucose/dextrose were the most common sugar ingredient types identified. Most Australian packaged food products had at least one added sugar ingredient, the most common being 'sugar'. The study provides insight into the most common types of added sugars and non-nutritive sweeteners used in the Australian food supply and is a useful baseline to monitor changes in how added sugars are used in Australian packaged foods over time.
Computer systems for annotation of single molecule fragments
Schwartz, David Charles; Severin, Jessica
2016-07-19
There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.
A guide to best practices for Gene Ontology (GO) manual annotation
Balakrishnan, Rama; Harris, Midori A.; Huntley, Rachael; Van Auken, Kimberly; Cherry, J. Michael
2013-01-01
The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org PMID:23842463
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buttler, D J
The Java Metadata Facility is introduced by Java Specification Request (JSR) 175 [1], and incorporated into the Java language specification [2] in version 1.5 of the language. The specification allows annotations on Java program elements: classes, interfaces, methods, and fields. Annotations give programmers a uniform way to add metadata to program elements that can be used by code checkers, code generators, or other compile-time or runtime components. Annotations are defined by annotation types. These are defined the same way as interfaces, but with the symbol {at} preceding the interface keyword. There are additional restrictions on defining annotation types: (1) Theymore » cannot be generic; (2) They cannot extend other annotation types or interfaces; (3) Methods cannot have any parameters; (4) Methods cannot have type parameters; (5) Methods cannot throw exceptions; and (6) The return type of methods of an annotation type must be a primitive, a String, a Class, an annotation type, or an array, where the type of the array is restricted to one of the four allowed types. See [2] for additional restrictions and syntax. The methods of an annotation type define the elements that may be used to parameterize the annotation in code. Annotation types may have default values for any of its elements. For example, an annotation that specifies a defect report could initialize an element defining the defect outcome submitted. Annotations may also have zero elements. This could be used to indicate serializability for a class (as opposed to the current Serializability interface).« less
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-02-09
BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-01-01
Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345
Facilitating the Transition from Bright to Dim Environments
2016-03-04
For the parametric data, a multivariate ANOVA was used in determining the systematic presence of any statistically significant performance differences...performed. All significance levels were p < 0.05, and statistical analyses were performed with the Statistical Package for Social Sciences ( SPSS ...1950. Age changes in rate and level of visual dark adaptation. Journal of Applied Physiology, 2, 407–411. Field, A. 2009. Discovering statistics
Moberg, Andreas; Hansson, Eva; Boyd, Helen
2014-01-01
Abstract With the public availability of biochemical assays and screening data constantly increasing, new applications for data mining and method analysis are evolving in parallel. One example is BioAssay Ontology (BAO) for systematic classification of assays based on screening setup and metadata annotations. In this article we report a high-throughput screening (HTS) against phospho-N-acetylmuramoyl-pentapeptide translocase (MraY), an attractive antibacterial drug target involved in peptidoglycan synthesis. The screen resulted in novel chemistry identification using a fluorescence resonance energy transfer assay. To address a subset of the false positive hits, a frequent hitter analysis was performed using an approach in which MraY hits were compared with hits from similar assays, previously used for HTS. The MraY assay was annotated according to BAO and three internal reference assays, using a similar assay design and detection technology, were identified. Analyzing the assays retrospectively, it was clear that both MraY and the three reference assays all showed a high false positive rate in the primary HTS assays. In the case of MraY, false positives were efficiently identified by applying a method to correct for compound interference at the hit-confirmation stage. Frequent hitter analysis based on the three reference assays with similar assay method identified additional false actives in the primary MraY assay as frequent hitters. This article demonstrates how assays annotated using BAO terms can be used to identify closely related reference assays, and that analysis based on these assays clearly can provide useful data to influence assay design, technology, and screening strategy. PMID:25415593
Leaf phenomics: a systematic reverse genetic screen for Arabidopsis leaf mutants.
Wilson-Sánchez, David; Rubio-Díaz, Silvia; Muñoz-Viana, Rafael; Pérez-Pérez, José Manuel; Jover-Gil, Sara; Ponce, María Rosa; Micol, José Luis
2014-09-01
The study and eventual manipulation of leaf development in plants requires a thorough understanding of the genetic basis of leaf organogenesis. Forward genetic screens have identified hundreds of Arabidopsis mutants with altered leaf development, but the genome has not yet been saturated. To identify genes required for leaf development we are screening the Arabidopsis Salk Unimutant collection. We have identified 608 lines that exhibit a leaf phenotype with full penetrance and almost constant expressivity and 98 additional lines with segregating mutant phenotypes. To allow indexing and integration with other mutants, the mutant phenotypes were described using a custom leaf phenotype ontology. We found that the indexed mutation is present in the annotated locus for 78% of the 553 mutants genotyped, and that in half of these the annotated T-DNA is responsible for the phenotype. To quickly map non-annotated T-DNA insertions, we developed a reliable, cost-effective and easy method based on whole-genome sequencing. To enable comprehensive access to our data, we implemented a public web application named PhenoLeaf (http://genetics.umh.es/phenoleaf) that allows researchers to query the results of our screen, including text and visual phenotype information. We demonstrated how this new resource can facilitate gene function discovery by identifying and characterizing At1g77600, which we found to be required for proximal-distal cell cycle-driven leaf growth, and At3g62870, which encodes a ribosomal protein needed for cell proliferation and chloroplast function. This collection provides a valuable tool for the study of leaf development, characterization of biomass feedstocks and examination of other traits in this fundamental photosynthetic organ. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.
Personalized Guideline-Based Treatment Recommendations Using Natural Language Processing Techniques.
Becker, Matthias; Böckmann, Britta
2017-01-01
Clinical guidelines and clinical pathways are accepted and proven instruments for quality assurance and process optimization. Today, electronic representation of clinical guidelines exists as unstructured text, but is not well-integrated with patient-specific information from electronic health records. Consequently, generic content of the clinical guidelines is accessible, but it is not possible to visualize the position of the patient on the clinical pathway, decision support cannot be provided by personalized guidelines for the next treatment step. The Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) provides common reference terminology as well as the semantic link for combining the pathways and the patient-specific information. This paper proposes a model-based approach to support the development of guideline-compliant pathways combined with patient-specific structured and unstructured information using SNOMED CT. To identify SNOMED CT concepts, a software was developed to extract SNOMED CT codes out of structured and unstructured German data to map these with clinical pathways annotated in accordance with the systematized nomenclature.
Yang, Jianji J; Cohen, Aaron M; Cohen, Aaron; McDonagh, Marian S
2008-11-06
Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.
Yang, Jianji J.; Cohen, Aaron M.; McDonagh, Marian S.
2008-01-01
Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent. To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation datasets for SR text mining research. PMID:18999194
Feuermann, Marc; Gaudet, Pascale; Mi, Huaiyu; Lewis, Suzanna E; Thomas, Paul D
2016-01-01
We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied. Here, we report our results from applying this paradigm to two well-characterized cellular processes, apoptosis and autophagy. This revealed several important observations with respect to GO annotations and how they can be used for function inference. Notably, we applied only a small fraction of the experimentally supported GO annotations to infer function in other family members. The majority of other annotations describe indirect effects, phenotypes or results from high throughput experiments. In addition, we show here how feedback from phylogenetic annotation leads to significant improvements in the PANTHER trees, the GO annotations and GO itself. Thus GO phylogenetic annotation both increases the quantity and improves the accuracy of the GO annotations provided to the research community. We expect these phylogenetically based annotations to be of broad use in gene enrichment analysis as well as other applications of GO annotations.Database URL: http://amigo.geneontology.org/amigo. © The Author(s) 2016. Published by Oxford University Press.
AnnotateGenomicRegions: a web application.
Zammataro, Luca; DeMolfetta, Rita; Bucci, Gabriele; Ceol, Arnaud; Muller, Heiko
2014-01-01
Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions.
AnnotateGenomicRegions: a web application
2014-01-01
Background Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Results Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. Conclusions The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions. PMID:24564446
Evaluation of web-based annotation of ophthalmic images for multicentric clinical trials.
Chalam, K V; Jain, P; Shah, V A; Shah, Gaurav Y
2006-06-01
An Internet browser-based annotation system can be used to identify and describe features in digitalized retinal images, in multicentric clinical trials, in real time. In this web-based annotation system, the user employs a mouse to draw and create annotations on a transparent layer, that encapsulates the observations and interpretations of a specific image. Multiple annotation layers may be overlaid on a single image. These layers may correspond to annotations by different users on the same image or annotations of a temporal sequence of images of a disease process, over a period of time. In addition, geometrical properties of annotated figures may be computed and measured. The annotations are stored in a central repository database on a server, which can be retrieved by multiple users in real time. This system facilitates objective evaluation of digital images and comparison of double-blind readings of digital photographs, with an identifiable audit trail. Annotation of ophthalmic images allowed clinically feasible and useful interpretation to track properties of an area of fundus pathology. This provided an objective method to monitor properties of pathologies over time, an essential component of multicentric clinical trials. The annotation system also allowed users to view stereoscopic images that are stereo pairs. This web-based annotation system is useful and valuable in monitoring patient care, in multicentric clinical trials, telemedicine, teaching and routine clinical settings.
Evaluating Hierarchical Structure in Music Annotations
McFee, Brian; Nieto, Oriol; Farbood, Morwaread M.; Bello, Juan Pablo
2017-01-01
Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for “flat” descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement. PMID:28824514
ERIC Educational Resources Information Center
Wolfe, Joanna
2008-01-01
Recent research on annotation interfaces provides provocative evidence that anchored, annotation-based discussion environments may lead to better conversations about a text. However, annotation interfaces raise complicated tradeoffs regarding screen real estate and positioning. It is argued that solving this screen real estate problem requires…
HAMAP in 2013, new developments in the protein family classification and annotation system
Pedruzzi, Ivo; Rivoire, Catherine; Auchincloss, Andrea H.; Coudert, Elisabeth; Keller, Guillaume; de Castro, Edouard; Baratin, Delphine; Cuche, Béatrice A.; Bougueleret, Lydie; Poux, Sylvain; Redaschi, Nicole; Xenarios, Ioannis; Bridge, Alan
2013-01-01
HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles. PMID:23193261
Utilizing Social Media Data for Pharmacovigilance: A Review
Sarker, Abeed; Ginn, Rachel; Nikfarjam, Azadeh; O’Connor, Karen; Smith, Karen; Jayaraman, Swetha; Upadhaya, Tejaswi; Gonzalez, Graciela
2015-01-01
Objective Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media. Methods We identified studies, describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to utilize ADR information posted by users on any publicly available social media platform. We categorized the studies into various dimensions such as primary ADR detection approach, size of data, source(s), availability, evaluation criteria, and so on. Results Twenty-two studies met our inclusion criteria, with fifteen (68.2%) published within the last two years. The survey revealed a clear trend towards the usage of annotated data with eleven of the fifteen (73.3%) studies published in the last two years relying on expert annotations. However, publicly available annotated data is still scarce, and we found only six (27.3%) studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular. Conclusion Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing with time. In terms of sources, both health-related and general social media data have been used for ADR detection— while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited publicly available annotated data available, and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community. PMID:25720841
Apollo: a sequence annotation editor
Lewis, SE; Searle, SMJ; Harris, N; Gibson, M; Iyer, V; Richter, J; Wiel, C; Bayraktaroglu, L; Birney, E; Crosby, MA; Kaminker, JS; Matthews, BB; Prochnik, SE; Smith, CD; Tupy, JL; Rubin, GM; Misra, S; Mungall, CJ; Clamp, ME
2002-01-01
The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects. PMID:12537571
Ozturk, Orgul D; McInnes, Melayne M; Blake, Christine E; Frongillo, Edward A; Jones, Sonya J
2016-01-01
The objective of this study is to develop a structured observational method for the systematic assessment of the food-choice architecture that can be used to identify key points for behavioral economic intervention intended to improve the health quality of children's diets. We use an ethnographic approach with observations at twelve elementary schools to construct our survey instrument. Elements of the structured observational method include decision environment, salience, accessibility/convenience, defaults/verbal prompts, number of choices, serving ware/method/packaging, and social/physical eating environment. Our survey reveals important "nudgeable" components of the elementary school food-choice architecture, including precommitment and default options on the lunch line.
Gupta, Rahul; Audhkhasi, Kartik; Jacokes, Zach; Rozga, Agata; Narayanan, Shrikanth
2018-01-01
Studies of time-continuous human behavioral phenomena often rely on ratings from multiple annotators. Since the ground truth of the target construct is often latent, the standard practice is to use ad-hoc metrics (such as averaging annotator ratings). Despite being easy to compute, such metrics may not provide accurate representations of the underlying construct. In this paper, we present a novel method for modeling multiple time series annotations over a continuous variable that computes the ground truth by modeling annotator specific distortions. We condition the ground truth on a set of features extracted from the data and further assume that the annotators provide their ratings as modification of the ground truth, with each annotator having specific distortion tendencies. We train the model using an Expectation-Maximization based algorithm and evaluate it on a study involving natural interaction between a child and a psychologist, to predict confidence ratings of the children's smiles. We compare and analyze the model against two baselines where: (i) the ground truth in considered to be framewise mean of ratings from various annotators and, (ii) each annotator is assumed to bear a distinct time delay in annotation and their annotations are aligned before computing the framewise mean.
The language of gene ontology: a Zipf's law analysis.
Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy
2012-06-07
Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.; Stein, Joshua C.; Olson, Andrew J.; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M.; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark
2015-01-01
The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563
Cross-organism learning method to discover new gene functionalities.
Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro
2016-04-01
Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Tyner, Shannon; Brewer, Adam; Helman, Meghan; Leon, Yanerys; Pritchard, Joshua; Schlund, Michael
2016-03-01
Dog phobias are common in individuals with autism; however, evidence supporting behavioral interventions is limited. The current study evaluated the efficacy of contact desensitization plus reinforcement on dog phobic behavior exhibited by three children diagnosed with autism. The treatment package improved contact with dogs in analog and naturalistic settings and the improvements were maintained at follow-up and in generalization tests. Parents/caregivers also provided high consumer satisfaction reports.Approximately 30 % of individuals diagnosed with autism also receive a comorbid diagnosis of a clinical phobia.Research has shown that behavioral treatment for dog phobias in individuals with intellectual disabilities is contact desensitization plus reinforcement using two hierarchies: size of the dog and distance to the dog; no escape extinction was necessary.The current systematic replication shows that this treatment package was effective for children with autism using only a single hierarchy composed of distance to the dog.Future practitioners may wish to examine whether this treatment package also produces changes in supplemental physiological measures such as pupil dilation, heart rate, galvanic skin responses, and respiration.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis
Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...
2014-01-01
Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
NASA Astrophysics Data System (ADS)
Weijers, Jan-Willem; Derudder, Veerle; Janssens, Sven; Petré, Frederik; Bourdoux, André
2006-12-01
To assess the performance of forthcoming 4th generation wireless local area networks, the algorithmic functionality is usually modelled using a high-level mathematical software package, for instance, Matlab. In order to validate the modelling assumptions against the real physical world, the high-level functional model needs to be translated into a prototype. A systematic system design methodology proves very valuable, since it avoids, or, at least reduces, numerous design iterations. In this paper, we propose a novel Matlab-to-hardware design flow, which allows to map the algorithmic functionality onto the target prototyping platform in a systematic and reproducible way. The proposed design flow is partly manual and partly tool assisted. It is shown that the proposed design flow allows to use the same testbench throughout the whole design flow and avoids time-consuming and error-prone intermediate translation steps.
Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus F X
2007-08-30
Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.
Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus FX
2007-01-01
Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from . PMID:17760972
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leung, Elo; Huang, Amy; Cadag, Eithon
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Leung, Elo; Huang, Amy; Cadag, Eithon; ...
2016-01-20
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Real-time image annotation by manifold-based biased Fisher discriminant analysis
NASA Astrophysics Data System (ADS)
Ji, Rongrong; Yao, Hongxun; Wang, Jicheng; Sun, Xiaoshuai; Liu, Xianming
2008-01-01
Automatic Linguistic Annotation is a promising solution to bridge the semantic gap in content-based image retrieval. However, two crucial issues are not well addressed in state-of-art annotation algorithms: 1. The Small Sample Size (3S) problem in keyword classifier/model learning; 2. Most of annotation algorithms can not extend to real-time online usage due to their low computational efficiencies. This paper presents a novel Manifold-based Biased Fisher Discriminant Analysis (MBFDA) algorithm to address these two issues by transductive semantic learning and keyword filtering. To address the 3S problem, Co-Training based Manifold learning is adopted for keyword model construction. To achieve real-time annotation, a Bias Fisher Discriminant Analysis (BFDA) based semantic feature reduction algorithm is presented for keyword confidence discrimination and semantic feature reduction. Different from all existing annotation methods, MBFDA views image annotation from a novel Eigen semantic feature (which corresponds to keywords) selection aspect. As demonstrated in experiments, our manifold-based biased Fisher discriminant analysis annotation algorithm outperforms classical and state-of-art annotation methods (1.K-NN Expansion; 2.One-to-All SVM; 3.PWC-SVM) in both computational time and annotation accuracy with a large margin.
A Novel Approach to Semantic and Coreference Annotation at LLNL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Firpo, M
A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have amore » system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.« less
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
Zheng, Jie; Stoyanovich, Julia; Manduchi, Elisabetta; Liu, Junmin; Stoeckert, Christian J.
2011-01-01
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute. Database URL: http://www.cbil.upenn.edu/annotCompute/ PMID:22190598
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.
Li, Jiao; Sun, Yueping; Johnson, Robin J; Sciaky, Daniela; Wei, Chih-Hsuan; Leaman, Robert; Davis, Allan Peter; Mattingly, Carolyn J; Wiegers, Thomas C; Lu, Zhiyong
2016-01-01
Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
Iwasaki, Wataru; Fukunaga, Tsukasa; Isagozawa, Ryota; Yamada, Koichiro; Maeda, Yasunobu; Satoh, Takashi P.; Sado, Tetsuya; Mabuchi, Kohji; Takeshima, Hirohiko; Miya, Masaki; Nishida, Mutsumi
2013-01-01
Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface. PMID:23955518
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
web cellHTS2: a web-application for the analysis of high-throughput screening data.
Pelz, Oliver; Gilsdorf, Moritz; Boutros, Michael
2010-04-12
The analysis of high-throughput screening data sets is an expanding field in bioinformatics. High-throughput screens by RNAi generate large primary data sets which need to be analyzed and annotated to identify relevant phenotypic hits. Large-scale RNAi screens are frequently used to identify novel factors that influence a broad range of cellular processes, including signaling pathway activity, cell proliferation, and host cell infection. Here, we present a web-based application utility for the end-to-end analysis of large cell-based screening experiments by cellHTS2. The software guides the user through the configuration steps that are required for the analysis of single or multi-channel experiments. The web-application provides options for various standardization and normalization methods, annotation of data sets and a comprehensive HTML report of the screening data analysis, including a ranked hit list. Sessions can be saved and restored for later re-analysis. The web frontend for the cellHTS2 R/Bioconductor package interacts with it through an R-server implementation that enables highly parallel analysis of screening data sets. web cellHTS2 further provides a file import and configuration module for common file formats. The implemented web-application facilitates the analysis of high-throughput data sets and provides a user-friendly interface. web cellHTS2 is accessible online at http://web-cellHTS2.dkfz.de. A standalone version as a virtual appliance and source code for platforms supporting Java 1.5.0 can be downloaded from the web cellHTS2 page. web cellHTS2 is freely distributed under GPL.
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia
2018-01-04
Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
New directions in biomedical text annotation: definitions, guidelines and corpus construction
Wilbur, W John; Rzhetsky, Andrey; Shatkay, Hagit
2006-01-01
Background While biomedical text mining is emerging as an important research area, practical results have proven difficult to achieve. We believe that an important first step towards more accurate text-mining lies in the ability to identify and characterize text that satisfies various types of information needs. We report here the results of our inquiry into properties of scientific text that have sufficient generality to transcend the confines of a narrow subject area, while supporting practical mining of text for factual information. Our ultimate goal is to annotate a significant corpus of biomedical text and train machine learning methods to automatically categorize such text along certain dimensions that we have defined. Results We have identified five qualitative dimensions that we believe characterize a broad range of scientific sentences, and are therefore useful for supporting a general approach to text-mining: focus, polarity, certainty, evidence, and directionality. We define these dimensions and describe the guidelines we have developed for annotating text with regard to them. To examine the effectiveness of the guidelines, twelve annotators independently annotated the same set of 101 sentences that were randomly selected from current biomedical periodicals. Analysis of these annotations shows 70–80% inter-annotator agreement, suggesting that our guidelines indeed present a well-defined, executable and reproducible task. Conclusion We present our guidelines defining a text annotation task, along with annotation results from multiple independently produced annotations, demonstrating the feasibility of the task. The annotation of a very large corpus of documents along these guidelines is currently ongoing. These annotations form the basis for the categorization of text along multiple dimensions, to support viable text mining for experimental results, methodology statements, and other forms of information. We are currently developing machine learning methods, to be trained and tested on the annotated corpus, that would allow for the automatic categorization of biomedical text along the general dimensions that we have presented. The guidelines in full detail, along with annotated examples, are publicly available. PMID:16867190
Semantic annotation of consumer health questions.
Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina
2018-02-06
Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most useful in estimating annotation confidence. To our knowledge, our corpus is the first focusing on annotation of uncurated consumer health questions. It is currently used to develop machine learning-based methods for question understanding. We make the corpus publicly available to stimulate further research on consumer health QA.
Misirli, Goksel; Cavaliere, Matteo; Waites, William; Pocock, Matthew; Madsen, Curtis; Gilfellon, Owen; Honorato-Zimmer, Ricardo; Zuliani, Paolo; Danos, Vincent; Wipat, Anil
2016-03-15
Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. The annotation ontology for rule-based models can be found at http://purl.org/rbm/rbmo The krdf tool and associated executable examples are available at http://purl.org/rbm/rbmo/krdf anil.wipat@newcastle.ac.uk or vdanos@inf.ed.ac.uk. © The Author 2015. Published by Oxford University Press.
Law, MeiYee; Childs, Kevin L; Campbell, Michael S; Stein, Joshua C; Olson, Andrew J; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M; Lawrence, Carolyn J; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark
2015-01-01
The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. © 2015 American Society of Plant Biologists. All Rights Reserved.
Semantic annotation in biomedicine: the current landscape.
Jovanović, Jelena; Bagheri, Ebrahim
2017-09-22
The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.
The distributed annotation system.
Dowell, R D; Jokerst, R M; Day, A; Eddy, S R; Stein, L
2001-01-01
Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory. Here we introduce a concept called the Distributed Annotation System (DAS). DAS allows sequence annotations to be decentralized among multiple third-party annotators and integrated on an as-needed basis by client-side software. The communication between client and servers in DAS is defined by the DAS XML specification. Annotations are displayed in layers, one per server. Any client or server adhering to the DAS XML specification can participate in the system; we describe a simple prototype client and server example. The DAS specification is being used experimentally by Ensembl, WormBase, and the Berkeley Drosophila Genome Project. Continued success will depend on the readiness of the research community to adopt DAS and provide annotations. All components are freely available from the project website http://www.biodas.org/.
ERIC Educational Resources Information Center
Grimes, George; And Others
A series of four pamphlets which describe the Regional Information System (RIS) of the Michigan-Ohio Regional Educational Laboratory (MOREL), a system designed to provide an effective, systematic methodology for linking users with relevant resources, compose the major portion of this information package. Each publication details an aspect of the…
Hermanowski, Tomasz Roman; Drozdowska, Aleksandra Krystyna; Kowalczyk, Marta
2015-01-01
Objectives In this paper, we emphasised that effective management of health plans beneficiaries access to reimbursed medicines requires proper institutional set-up. The main objective was to identify and recommend an institutional framework of integrated pharmaceutical care providing effective, safe and equitable access to medicines. Method The institutional framework of drug policy was derived on the basis of publications obtained by systematic reviews. A comparative analysis concerning adaptation of coordinated pharmaceutical care services in the USA, the UK, Poland, Italy, Denmark and Germany was performed. Results While most European Union Member States promote the implementation of selected e-Health tools, like e-Prescribing, these efforts do not necessarily implement an integrated package. There is no single agent who would manage an insured patients’ access to medicines and health care in a coordinated manner, thereby increasing the efficiency and safety of drug policy. More attention should be paid by European Union Member States as to how to integrate various e-Health tools to enhance benefits to both individuals and societies. One solution could be to implement an integrated “pharmacy benefit management” model, which is well established in the USA and Canada and provides an integrated package of cost-containment methods, implemented within a transparent institutional framework and powered by strong motivation of the agent. PMID:26528099
RysannMD: A biomedical semantic annotator balancing speed and accuracy.
Cuzzola, John; Jovanović, Jelena; Bagheri, Ebrahim
2017-07-01
Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F 1 measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F 1 measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously. Copyright © 2017 Elsevier Inc. All rights reserved.
Martínez Barrio, Álvaro; Lagercrantz, Erik; Sperber, Göran O; Blomberg, Jonas; Bongcam-Rudloff, Erik
2009-01-01
Background The Distributed Annotation System (DAS) is a widely used network protocol for sharing biological information. The distributed aspects of the protocol enable the use of various reference and annotation servers for connecting biological sequence data to pertinent annotations in order to depict an integrated view of the data for the final user. Results An annotation server has been devised to provide information about the endogenous retroviruses detected and annotated by a specialized in silico tool called RetroTector. We describe the procedure to implement the DAS 1.5 protocol commands necessary for constructing the DAS annotation server. We use our server to exemplify those steps. Data distribution is kept separated from visualization which is carried out by eBioX, an easy to use open source program incorporating multiple bioinformatics utilities. Some well characterized endogenous retroviruses are shown in two different DAS clients. A rapid analysis of areas free from retroviral insertions could be facilitated by our annotations. Conclusion The DAS protocol has shown to be advantageous in the distribution of endogenous retrovirus data. The distributed nature of the protocol is also found to aid in combining annotation and visualization along a genome in order to enhance the understanding of ERV contribution to its evolution. Reference and annotation servers are conjointly used by eBioX to provide visualization of ERV annotations as well as other data sources. Our DAS data source can be found in the central public DAS service repository, , or at . PMID:19534743
Nierkens, Vera; Hartman, Marieke A.; Nicolaou, Mary; Vissenberg, Charlotte; Beune, Erik J. A. J.; Hosper, Karen; van Valkengoed, Irene G.; Stronks, Karien
2013-01-01
Background The importance of cultural adaptations in behavioral interventions targeting ethnic minorities in high-income societies is widely recognized. Little is known, however, about the effectiveness of specific cultural adaptations in such interventions. Aim To systematically review the effectiveness of specific cultural adaptations in interventions that target smoking cessation, diet, and/or physical activity and to explore features of such adaptations that may account for their effectiveness. Methods Systematic review using MEDLINE, PsycINFO, Embase, and the Cochrane Central Register of Controlled Trials registers (1997–2009). Inclusion criteria: a) effectiveness study of a lifestyle intervention targeted to ethnic minority populations living in a high income society; b) interventions included cultural adaptations and a control group that was exposed to the intervention without the cultural adaptation under study; c) primary outcome measures included smoking cessation, diet, or physical activity. Results Out of 44904 hits, we identified 17 studies, all conducted in the United States. In five studies, specific cultural adaptations had a statistically significant effect on primary outcomes. The remaining studies showed no significant effects on primary outcomes, but some presented trends favorable for cultural adaptations. We observed that interventions incorporating a package of cultural adaptations, cultural adaptations that implied higher intensity and those incorporating family values were more likely to report statistically significant effects. Adaptations in smoking cessation interventions seem to be more effective than adaptations in interventions aimed at diet and physical activity. Conclusion This review indicates that culturally targeted behavioral interventions may be more effective if cultural adaptations are implemented as a package of adaptations, the adaptation includes family level, and where the adaptation results in a higher intensity of the intervention. More systematic experiments are needed in which the aim is to gain insight in the best mix of cultural adaptations among diverse populations in various settings, particularly outside the US. PMID:24116000
Nierkens, Vera; Hartman, Marieke A; Nicolaou, Mary; Vissenberg, Charlotte; Beune, Erik J A J; Hosper, Karen; van Valkengoed, Irene G; Stronks, Karien
2013-01-01
The importance of cultural adaptations in behavioral interventions targeting ethnic minorities in high-income societies is widely recognized. Little is known, however, about the effectiveness of specific cultural adaptations in such interventions. To systematically review the effectiveness of specific cultural adaptations in interventions that target smoking cessation, diet, and/or physical activity and to explore features of such adaptations that may account for their effectiveness. Systematic review using MEDLINE, PsycINFO, Embase, and the Cochrane Central Register of Controlled Trials registers (1997-2009). a) effectiveness study of a lifestyle intervention targeted to ethnic minority populations living in a high income society; b) interventions included cultural adaptations and a control group that was exposed to the intervention without the cultural adaptation under study; c) primary outcome measures included smoking cessation, diet, or physical activity. Out of 44904 hits, we identified 17 studies, all conducted in the United States. In five studies, specific cultural adaptations had a statistically significant effect on primary outcomes. The remaining studies showed no significant effects on primary outcomes, but some presented trends favorable for cultural adaptations. We observed that interventions incorporating a package of cultural adaptations, cultural adaptations that implied higher intensity and those incorporating family values were more likely to report statistically significant effects. Adaptations in smoking cessation interventions seem to be more effective than adaptations in interventions aimed at diet and physical activity. This review indicates that culturally targeted behavioral interventions may be more effective if cultural adaptations are implemented as a package of adaptations, the adaptation includes family level, and where the adaptation results in a higher intensity of the intervention. More systematic experiments are needed in which the aim is to gain insight in the best mix of cultural adaptations among diverse populations in various settings, particularly outside the US.
Building gold standard corpora for medical natural language processing tasks.
Deleger, Louise; Li, Qi; Lingren, Todd; Kaiser, Megan; Molnar, Katalin; Stoutenborough, Laura; Kouril, Michal; Marsolo, Keith; Solti, Imre
2012-01-01
We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae
Meng, Shaowu; Brown, Douglas E; Ebbole, Daniel J; Torto-Alalibo, Trudy; Oh, Yeon Yee; Deng, Jixin; Mitchell, Thomas K; Dean, Ralph A
2009-01-01
Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 . However, a comprehensive manual curation remains to be performed. Gene Ontology (GO) annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational) GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO). In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57%) being annotated with 1,957 distinct and specific GO terms. Unannotated proteins were assigned to the 3 root terms. The Version 5 GO annotation is publically queryable via the GO site . Additionally, the genome of M. oryzae is constantly being refined and updated as new information is incorporated. For the latest GO annotation of Version 6 genome, please visit our website . The preliminary GO annotation of Version 6 genome is placed at a local MySql database that is publically queryable via a user-friendly interface Adhoc Query System. Conclusion Our analysis provides comprehensive and robust GO annotations of the M. oryzae genome assemblies that will be solid foundations for further functional interrogation of M. oryzae. PMID:19278556
Prabhakaran, Dorairaj; Anand, Shuchi; Watkins, David; Gaziano, Thomas; Wu, Yangfeng; Mbanya, Jean Claude; Nugent, Rachel
2018-01-01
Cardiovascular, respiratory, and related disorders (CVRDs) are the leading causes of adult death worldwide, and substantial inequalities in care of patients with CVRDs exist between countries of high income and countries of low and middle income. Based on current trends, the UN Sustainable Development Goal to reduce premature mortality due to CVRDs by a third by 2030 will be challenging for many countries of low and middle income. We did systematic literature reviews of effectiveness and cost-effectiveness to identify priority interventions. We summarise the key findings and present a costed essential package of interventions to reduce risk of and manage CVRDs. On a population level, we recommend tobacco taxation, bans on trans fats, and compulsory reduction of salt in manufactured food products. We suggest primary health services be strengthened through the establishment of locally endorsed guidelines and ensured availability of essential medications. The policy interventions and health service delivery package we suggest could serve as the cornerstone for the management of CVRDs, and afford substantial financial risk protection for vulnerable households. We estimate that full implementation of the essential package would cost an additional US$21 per person in the average low-income country and $24 in the average lower-middle-income country. The essential package we describe could be a starting place for low-income and middle-income countries developing universal health coverage packages. Interventions could be rolled out as disease burden demands and budgets allow. Our outlined interventions provide a pathway for countries attempting to convert the UN Sustainable Development Goal commitments into tangible action. PMID:29108723
Beijbom, Oscar; Edmunds, Peter J.; Roelfsema, Chris; Smith, Jennifer; Kline, David I.; Neal, Benjamin P.; Dunlap, Matthew J.; Moriarty, Vincent; Fan, Tung-Yung; Tan, Chih-Jui; Chan, Stephen; Treibitz, Tali; Gamst, Anthony; Mitchell, B. Greg; Kriegman, David
2015-01-01
Global climate change and other anthropogenic stressors have heightened the need to rapidly characterize ecological changes in marine benthic communities across large scales. Digital photography enables rapid collection of survey images to meet this need, but the subsequent image annotation is typically a time consuming, manual task. We investigated the feasibility of using automated point-annotation to expedite cover estimation of the 17 dominant benthic categories from survey-images captured at four Pacific coral reefs. Inter- and intra- annotator variability among six human experts was quantified and compared to semi- and fully- automated annotation methods, which are made available at coralnet.ucsd.edu. Our results indicate high expert agreement for identification of coral genera, but lower agreement for algal functional groups, in particular between turf algae and crustose coralline algae. This indicates the need for unequivocal definitions of algal groups, careful training of multiple annotators, and enhanced imaging technology. Semi-automated annotation, where 50% of the annotation decisions were performed automatically, yielded cover estimate errors comparable to those of the human experts. Furthermore, fully-automated annotation yielded rapid, unbiased cover estimates but with increased variance. These results show that automated annotation can increase spatial coverage and decrease time and financial outlay for image-based reef surveys. PMID:26154157
MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.
Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco
2016-04-01
Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Miljković, Filip; Kunimoto, Ryo; Bajorath, Jürgen
2017-08-01
Computational exploration of small-molecule-based relationships between target proteins from different families. Target annotations of drugs and other bioactive compounds were systematically analyzed on the basis of high-confidence activity data. A total of 286 novel chemical links were established between distantly related or unrelated target proteins. These relationships involved a total of 1859 bioactive compounds including 147 drugs and 141 targets. Computational analysis of large amounts of compounds and activity data has revealed unexpected relationships between diverse target proteins on the basis of compounds they share. These relationships are relevant for drug discovery efforts. Target pairs that we have identified and associated compound information are made freely available.
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.
Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario
2011-01-01
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Wei, Qing; Khan, Ishita K; Ding, Ziyun; Yerneni, Satwica; Kihara, Daisuke
2017-03-20
The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .
Bindschedler, Laurence V.; Burgis, Timothy A.; Mills, Davinia J. S.; Ho, Jenny T. C.; Cramer, Rainer; Spanu, Pietro D.
2009-01-01
To further our understanding of powdery mildew biology during infection, we undertook a systematic shotgun proteomics analysis of the obligate biotroph Blumeria graminis f. sp. hordei at different stages of development in the host. Moreover we used a proteogenomics approach to feed information into the annotation of the newly sequenced genome. We analyzed and compared the proteomes from three stages of development representing different functions during the plant-dependent vegetative life cycle of this fungus. We identified 441 proteins in ungerminated spores, 775 proteins in epiphytic sporulating hyphae, and 47 proteins from haustoria inside barley leaf epidermal cells and used the data to aid annotation of the B. graminis f. sp. hordei genome. We also compared the differences in the protein complement of these key stages. Although confirming some of the previously reported findings and models derived from the analysis of transcriptome dynamics, our results also suggest that the intracellular haustoria are subject to stress possibly as a result of the plant defense strategy, including the production of reactive oxygen species. In addition, a number of small haustorial proteins with a predicted N-terminal signal peptide for secretion were identified in infected tissues: these represent candidate effector proteins that may play a role in controlling host metabolism and immunity. PMID:19602707
Case Studies in Describing Scientific Research Efforts as Linked Data
NASA Astrophysics Data System (ADS)
Gandara, A.; Villanueva-Rosales, N.; Gates, A.
2013-12-01
The Web is growing with numerous scientific resources, prompting increased efforts in information management to consider integration and exchange of scientific resources. Scientists have many options to share scientific resources on the Web; however, existing options provide limited support to scientists in annotating and relating research resources resulting from a scientific research effort. Moreover, there is no systematic approach to documenting scientific research and sharing it on the Web. This research proposes the Collect-Annotate-Refine-Publish (CARP) Methodology as an approach for guiding documentation of scientific research on the Semantic Web as scientific collections. Scientific collections are structured descriptions about scientific research that make scientific results accessible based on context. In addition, scientific collections enhance the Linked Data data space and can be queried by machines. Three case studies were conducted on research efforts at the Cyber-ShARE Research Center of Excellence in order to assess the effectiveness of the methodology to create scientific collections. The case studies exposed the challenges and benefits of leveraging the Semantic Web and Linked Data data space to facilitate access, integration and processing of Web-accessible scientific resources and research documentation. As such, we present the case study findings and lessons learned in documenting scientific research using CARP.
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock.
Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta
2016-01-01
Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf.
Dutta, Shuchismita; Dimitropoulos, Dimitris; Feng, Zukang; Persikova, Irina; Sen, Sanchayita; Shao, Chenghua; Westbrook, John; Young, Jasmine; Zhuravleva, Marina A; Kleywegt, Gerard J; Berman, Helen M
2014-01-01
With the accumulation of a large number and variety of molecules in the Protein Data Bank (PDB) comes the need on occasion to review and improve their representation. The Worldwide PDB (wwPDB) partners have periodically updated various aspects of structural data representation to improve the integrity and consistency of the archive. The remediation effort described here was focused on improving the representation of peptide-like inhibitor and antibiotic molecules so that they can be easily identified and analyzed. Peptide-like inhibitors or antibiotics were identified in over 1000 PDB entries, systematically reviewed and represented either as peptides with polymer sequence or as single components. For the majority of the single-component molecules, their peptide-like composition was captured in a new representation, called the subcomponent sequence. A novel concept called “group” was developed for representing complex peptide-like antibiotics and inhibitors that are composed of multiple polymer and nonpolymer components. In addition, a reference dictionary was developed with detailed information about these peptide-like molecules to aid in their annotation, identification and analysis. Based on the experience gained in this remediation, guidelines, procedures, and tools were developed to annotate new depositions containing peptide-like inhibitors and antibiotics accurately and consistently. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 659–668, 2014. PMID:24173824
Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John
2016-01-01
Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386
ASDB: a resource for probing protein functions with small molecules.
Liu, Zhihong; Ding, Peng; Yan, Xin; Zheng, Minghao; Zhou, Huihao; Xu, Yuehua; Du, Yunfei; Gu, Qiong; Xu, Jun
2016-06-01
: Identifying chemical probes or seeking scaffolds for a specific biological target is important for protein function studies. Therefore, we create the Annotated Scaffold Database (ASDB), a computer-readable and systematic target-annotated scaffold database, to serve such needs. The scaffolds in ASDB were derived from public databases including ChEMBL, DrugBank and TCMSP, with a scaffold-based classification approach. Each scaffold was assigned with an InChIKey as its unique identifier, energy-minimized 3D conformations, and other calculated properties. A scaffold is also associated with drugs, natural products, drug targets and medical indications. The database can be retrieved through text or structure query tools. ASDB collects 333 601 scaffolds, which are associated with 4368 targets. The scaffolds consist of 3032 scaffolds derived from drugs and 5163 scaffolds derived from natural products. For given scaffolds, scaffold-target networks can be generated from the database to demonstrate the relations of scaffolds and targets. ASDB is freely available at http://www.rcdd.org.cn/asdb/with the major web browsers. junxu@biochemomes.com or xujun9@mail.sysu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
SZGR 2.0: a one-stop shop of schizophrenia candidate genes
Jia, Peilin; Han, Guangchun; Zhao, Junfei; Lu, Pinyi; Zhao, Zhongming
2017-01-01
SZGR 2.0 is a comprehensive resource of candidate variants and genes for schizophrenia, covering genetic, epigenetic, transcriptomic, translational and many other types of evidence. By systematic review and curation of multiple lines of evidence, we included almost all variants and genes that have ever been reported to be associated with schizophrenia. In particular, we collected ∼4200 common variants reported in genome-wide association studies, ∼1000 de novo mutations discovered by large-scale sequencing of family samples, 215 genes spanning rare and replication copy number variations, 99 genes overlapping with linkage regions, 240 differentially expressed genes, 4651 differentially methylated genes and 49 genes as antipsychotic drug targets. To facilitate interpretation, we included various functional annotation data, especially brain eQTL, methylation QTL, brain expression featured in deep categorization of brain areas and developmental stages and brain-specific promoter and enhancer annotations. Furthermore, we conducted cross-study, cross-data type and integrative analyses of the multidimensional data deposited in SZGR 2.0, and made the data and results available through a user-friendly interface. In summary, SZGR 2.0 provides a one-stop shop of schizophrenia variants and genes and their function and regulation, providing an important resource in the schizophrenia and other mental disease community. SZGR 2.0 is available at https://bioinfo.uth.edu/SZGR/. PMID:27733502
dEMBF: A Comprehensive Database of Enzymes of Microalgal Biofuel Feedstock
Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar; Mishra, Barada Kanta
2016-01-01
Microalgae have attracted wide attention as one of the most versatile renewable feedstocks for production of biofuel. To develop genetically engineered high lipid yielding algal strains, a thorough understanding of the lipid biosynthetic pathway and the underpinning enzymes is essential. In this work, we have systematically mined the genomes of fifteen diverse algal species belonging to Chlorophyta, Heterokontophyta, Rhodophyta, and Haptophyta, to identify and annotate the putative enzymes of lipid metabolic pathway. Consequently, we have also developed a database, dEMBF (Database of Enzymes of Microalgal Biofuel Feedstock), which catalogues the complete list of identified enzymes along with their computed annotation details including length, hydrophobicity, amino acid composition, subcellular location, gene ontology, KEGG pathway, orthologous group, Pfam domain, intron-exon organization, transmembrane topology, and secondary/tertiary structural data. Furthermore, to facilitate functional and evolutionary study of these enzymes, a collection of built-in applications for BLAST search, motif identification, sequence and phylogenetic analysis have been seamlessly integrated into the database. dEMBF is the first database that brings together all enzymes responsible for lipid synthesis from available algal genomes, and provides an integrative platform for enzyme inquiry and analysis. This database will be extremely useful for algal biofuel research. It can be accessed at http://bbprof.immt.res.in/embf. PMID:26727469
Sma3s: a three-step modular annotator for large sequence datasets.
Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J
2014-08-01
Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.
Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.
2012-01-01
Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832
Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F
2012-01-01
The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
Bioinformatics approach reveals systematic mechanism underlying lung adenocarcinoma.
Wu, Xiya; Zhang, Wei; Hu, Yunhua; Yi, Xianghua
2015-01-01
The purpose of this work was to explore the systematic molecular mechanism of lung adenocarcinoma and gain a deeper insight into it. Comprehensive bioinformatics methods were applied. Initially, significant differentially expressed genes (DEGs) were analyzed from the Affymetrix microarray data (GSE27262) deposited in the Gene Expression Omnibus (GEO). Subsequently, gene ontology (GO) analysis was performed using online Database for Annotation, Visualization and Integration Discovery (DAVID) software. Finally, significant pathway crosstalk was investigated based on the information derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. According to our results, the N-terminal globular domain of the type X collagen (COL10A1) gene and transmembrane protein 100 (TMEM100) gene were identified to be the most significant DEGs in tumor tissue compared with the adjacent normal tissues. The main GO categories were biological process, cellular component and molecular function. In addition, the crosstalk was significantly different between non-small cell lung cancer pathways and inositol phosphate metabolism pathway, focal adhesion signal pathway, vascular smooth muscle contraction signal pathway, peroxisome proliferator-activated receptor (PPAR) signaling pathway and calcium signaling pathway in tumor. Dysfunctional genes and pathways may play key roles in the progression and development of lung adenocarcinoma. Our data provide a systematic perspective for understanding this mechanism and may be helpful in discovering an effective treatment for lung adenocarcinoma.