Exploring Global Exposure Factors Resources URLs
The dataset is a compilation of hyperlinks (URLs) for resources (databases, compendia, published articles, etc.) useful for exposure assessment specific to consumer product use.This dataset is associated with the following publication:Zaleski, R., P. Egeghy, and P. Hakkinen. Exploring Global Exposure Factors Resources for Use in Consumer Exposure Assessments. International Journal of Environmental Research and Public Health. Molecular Diversity Preservation International, Basel, SWITZERLAND, 13(7): 744, (2016).
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions
Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.
2014-01-01
Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498
76 FR 44048 - Agency Information Collection Activities: Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2011-07-22
... Collection: The GSS is a census of all eligible academic institutions and all departments in science and... Policy Analysis and Research (WebCASPAR) database system. The URL for WebCASPAR is http://caspar.nsf.gov...
ERIC Educational Resources Information Center
Smith, Teresa S.
The Internet is a network of networks which continually accumulates and amasses information, much of which is without organization and evaluation. This study addresses the need for establishing a database of Uniform Resource Locators (URLs), and for collecting, organizing, indexing, and publishing catalogs of URLs. Librarians and information…
The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video
Harper, Lisa C.; Schaeffer, Mary L.; Thistle, Jordan; Gardiner, Jack M.; Andorf, Carson M.; Campbell, Darwin A.; Cannon, Ethalinda K.S.; Braun, Bremen L.; Birkett, Scott M.; Lawrence, Carolyn J.; Sen, Taner Z.
2011-01-01
Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases. At MaizeGDB, we have developed a number of video tutorials that demonstrate how to use various tools and explicitly outline the caveats researchers should know to interpret the information available to them. One such popular video currently available is ‘Using the MaizeGDB Genome Browser’, which describes how the maize genome was sequenced and assembled as well as how the sequence can be visualized and interacted with via the MaizeGDB Genome Browser. Database URL: http://www.maizegdb.org/ PMID:21565781
A Concept for Continuous Monitoring that Reduces Redundancy in Information Assurance Processes
2011-09-01
System.out.println(“Driver loaded”); String url=“jdbc:postgresql://localhost/IAcontrols”; String user = “ postgres ”; String pwd... postgres ”; Connection DB_mobile_conn = DriverManager.getConnection(url,user,pwd); System.out.println(“Database Connect ok...user = “ postgres ”; String pwd = “ postgres ”; Connection DB_mobile_conn = DriverManager.getConnection(url,user,pwd); System.out.println
Zhang, Qingzhou; Yang, Bo; Chen, Xujiao; Xu, Jing; Mei, Changlin; Mao, Zhiguo
2014-01-01
We present a bioinformatics database named Renal Gene Expression Database (RGED), which contains comprehensive gene expression data sets from renal disease research. The web-based interface of RGED allows users to query the gene expression profiles in various kidney-related samples, including renal cell lines, human kidney tissues and murine model kidneys. Researchers can explore certain gene profiles, the relationships between genes of interests and identify biomarkers or even drug targets in kidney diseases. The aim of this work is to provide a user-friendly utility for the renal disease research community to query expression profiles of genes of their own interest without the requirement of advanced computational skills. Availability and implementation: Website is implemented in PHP, R, MySQL and Nginx and freely available from http://rged.wall-eva.net. Database URL: http://rged.wall-eva.net PMID:25252782
CHRONIS: an animal chromosome image database.
Toyabe, Shin-Ichi; Akazawa, Kouhei; Fukushi, Daisuke; Fukui, Kiichi; Ushiki, Tatsuo
2005-01-01
We have constructed a database system named CHRONIS (CHROmosome and Nano-Information System) to collect images of animal chromosomes and related nanotechnological information. CHRONIS enables rapid sharing of information on chromosome research among cell biologists and researchers in other fields via the Internet. CHRONIS is also intended to serve as a liaison tool for researchers who work in different centers. The image database contains more than 3,000 color microscopic images, including karyotypic images obtained from more than 1,000 species of animals. Researchers can browse the contents of the database using a usual World Wide Web interface in the following URL: http://chromosome.med.niigata-u.ac.jp/chronis/servlet/chronisservlet. The system enables users to input new images into the database, to locate images of interest by keyword searches, and to display the images with detailed information. CHRONIS has a wide range of applications, such as searching for appropriate probes for fluorescent in situ hybridization, comparing various kinds of microscopic images of a single species, and finding researchers working in the same field of interest.
Veluraja, Kasinadar; Selvin, Jeyasigamani F A; Venkateshwari, Selvakumar; Priyadarzini, Thanu R K
2010-09-23
The inherent flexibility and lack of strong intramolecular interactions of oligosaccharides demand the use of theoretical methods for their structural elucidation. In spite of the developments of theoretical methods, not much research on glycoinformatics is done so far when compared to bioinformatics research on proteins and nucleic acids. We have developed three dimensional structural database for a sialic acid-containing carbohydrates (3DSDSCAR). This is an open-access database that provides 3D structural models of a given sialic acid-containing carbohydrate. At present, 3DSDSCAR contains 60 conformational models, belonging to 14 different sialic acid-containing carbohydrates, deduced through 10 ns molecular dynamics (MD) simulations. The database is available at the URL: http://www.3dsdscar.org. Copyright 2010 Elsevier Ltd. All rights reserved.
DAVID-WS: a stateful web service to facilitate gene/protein list analysis
Jiao, Xiaoli; Sherman, Brad T.; Huang, Da Wei; Stephens, Robert; Baseler, Michael W.; Lane, H. Clifford; Lempicki, Richard A.
2012-01-01
Summary: The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. Availability: The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html. Contact: xiaoli.jiao@nih.gov; rlempicki@nih.gov PMID:22543366
DAVID-WS: a stateful web service to facilitate gene/protein list analysis.
Jiao, Xiaoli; Sherman, Brad T; Huang, Da Wei; Stephens, Robert; Baseler, Michael W; Lane, H Clifford; Lempicki, Richard A
2012-07-01
The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html.
Trends in the production of scientific data analysis resources.
Hennessey, Jason; Georgescu, Constantin; Wren, Jonathan D
2014-01-01
As the amount of scientific data grows, peer-reviewed Scientific Data Analysis Resources (SDARs) such as published software programs, databases and web servers have had a strong impact on the productivity of scientific research. SDARs are typically linked to using an Internet URL, which have been shown to decay in a time-dependent fashion. What is less clear is whether or not SDAR-producing group size or prior experience in SDAR production correlates with SDAR persistence or whether certain institutions or regions account for a disproportionate number of peer-reviewed resources. We first quantified the current availability of over 26,000 unique URLs published in MEDLINE abstracts/titles over the past 20 years, then extracted authorship, institutional and ZIP code data. We estimated which URLs were SDARs by using keyword proximity analysis. We identified 23,820 non-archival URLs produced between 1996 and 2013, out of which 11,977 were classified as SDARs. Production of SDARs as measured with the Gini coefficient is more widely distributed among institutions (.62) and ZIP codes (.65) than scientific research in general, which tends to be disproportionately clustered within elite institutions (.91) and ZIPs (.96). An estimated one percent of institutions produced 68% of published research whereas the top 1% only accounted for 16% of SDARs. Some labs produced many SDARs (maximum detected = 64), but 74% of SDAR-producing authors have only published one SDAR. Interestingly, decayed SDARs have significantly fewer average authors (4.33 +/- 3.06), than available SDARs (4.88 +/- 3.59) (p < 8.32 × 10-4). Approximately 3.4% of URLs, as published, contain errors in their entry/format, including DOIs and links to clinical trials registry numbers. SDAR production is less dependent upon institutional location and resources, and SDAR online persistence does not seem to be a function of infrastructure or expertise. Yet, SDAR team size correlates positively with SDAR accessibility, suggesting a possible sociological factor involved. While a detectable URL entry error rate of 3.4% is relatively low, it raises the question of whether or not this is a general error rate that extends to additional published entities.
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.
Rigden, Daniel J; Fernández, Xosé M
2018-01-04
The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs
2015-01-01
Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
PlantCAZyme: a database for plant carbohydrate-active enzymes
Ekstrom, Alexander; Taujale, Rahil; McGinn, Nathan; Yin, Yanbin
2014-01-01
PlantCAZyme is a database built upon dbCAN (database for automated carbohydrate active enzyme annotation), aiming to provide pre-computed sequence and annotation data of carbohydrate active enzymes (CAZymes) to plant carbohydrate and bioenergy research communities. The current version contains data of 43 790 CAZymes of 159 protein families from 35 plants (including angiosperms, gymnosperms, lycophyte and bryophyte mosses) and chlorophyte algae with fully sequenced genomes. Useful features of the database include: (i) a BLAST server and a HMMER server that allow users to search against our pre-computed sequence data for annotation purpose, (ii) a download page to allow batch downloading data of a specific CAZyme family or species and (iii) protein browse pages to provide an easy access to the most comprehensive sequence and annotation data. Database URL: http://cys.bios.niu.edu/plantcazyme/ PMID:25125445
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss
Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia
2011-01-01
SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/ PMID:22120661
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss.
Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia
2011-01-01
SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/
The Giardia genome project database.
McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L
2000-08-15
The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.
PROFESS: a PROtein Function, Evolution, Structure and Sequence database
Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter
2010-01-01
The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
Kalium: a database of potassium channel toxins from scorpion venom.
Kuzmenkov, Alexey I; Krylov, Nikolay A; Chugunov, Anton O; Grishin, Eugene V; Vassilevski, Alexander A
2016-01-01
Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community.Database URL:http://kaliumdb.org/. © The Author(s) 2016. Published by Oxford University Press.
The Corvids Literature Database--500 years of ornithological research from a crow's perspective.
Droege, Gabriele; Töpfer, Till
2016-01-01
Corvids (Corvidae) play a major role in ornithological research. Because of their worldwide distribution, diversity and adaptiveness, they have been studied extensively. The aim of the Corvids Literature Database (CLD, http://www.corvids.de/cld) is to record all publications (citation format) on all extant and extinct Crows, Ravens, Jays and Magpies worldwide and tag them with specific keywords making them available for researchers worldwide. The self-maintained project started in 2006 and today comprises 8000 articles, spanning almost 500 years. The CLD covers publications from 164 countries, written in 36 languages and published by 8026 authors in 1503 journals (plus books, theses and other publications). Forty-nine percent of all records are available online as full-text documents or deposited in the physical CLD archive. The CLD contains 442 original corvid descriptions. Here, we present a metadata assessment of articles recorded in the CLD including a gap analysis and prospects for future research. Database URL: http://www.corvids.de/cld. © The Author(s) 2016. Published by Oxford University Press.
How Intrusion Detection Can Improve Software Decoy Applications
2003-03-01
THIS PAGE INTENTIONALLY LEFT BLANK 41 V. DISCUSSION Military history suggests it is best to employ a layered, defense-in...database: alert, postgresql , user=snort dbname=snort # output database: log, unixodbc, user=snort dbname=snort # output database: log, mssql, dbname...Threat Monitoring and Surveillance, James P. Anderson Co., Fort Washington. PA, April 1980. URL http://csrc.nist.gov/publications/ history /ande80
CHOmine: an integrated data warehouse for CHO systems biology and modeling
Hanscho, Michael; Ruckerbauer, David E.; Zanghellini, Jürgen; Borth, Nicole
2017-01-01
Abstract The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology. Database URL: http://www.chogenome.org PMID:28605771
Choosing a genome browser for a Model Organism Database: surveying the Maize community
Sen, Taner Z.; Harper, Lisa C.; Schaeffer, Mary L.; Andorf, Carson M.; Seigfried, Trent E.; Campbell, Darwin A.; Lawrence, Carolyn J.
2010-01-01
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchers’ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchers’ needs. Here, we document the survey’s outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly. Database URL: http://gbrowse.maizegdb.org/ PMID:20627860
MPD: a pathogen genome and metagenome database
Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen
2018-01-01
Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
VerSeDa: vertebrate secretome database
Cortazar, Ana R.; Oguiza, José A.
2017-01-01
Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. Database URL: VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php PMID:28365718
BioMart Central Portal: an open database network for the biological community
Guberman, Jonathan M.; Ai, J.; Arnaiz, O.; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J.; Di Génova, A.; Forbes, Simon; Fujisawa, T.; Gadaleta, E.; Goodstein, D. M.; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S.; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R.; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J.; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S.; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B.; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J.; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D. T.; Wong-Erasmus, Marie; Yao, L.; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek
2011-01-01
BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities. Database URL: http://central.biomart.org. PMID:21930507
BGDB: a database of bivalent genes.
Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua
2013-01-01
Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/
Childs, Kevin L; Konganti, Kranti; Buell, C Robin
2012-01-01
Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.
Spectroscopic data for an astronomy database
NASA Technical Reports Server (NTRS)
Parkinson, W. H.; Smith, Peter L.
1995-01-01
Very few of the atomic and molecular data used in analyses of astronomical spectra are currently available in World Wide Web (WWW) databases that are searchable with hypertext browsers. We have begun to rectify this situation by making extensive atomic data files available with simple search procedures. We have also established links to other on-line atomic and molecular databases. All can be accessed from our database homepage with URL: http:// cfa-www.harvard.edu/ amp/ data/ amdata.html.
The Corvids Literature Database—500 years of ornithological research from a crow’s perspective
Droege, Gabriele; Töpfer, Till
2016-01-01
Corvids (Corvidae) play a major role in ornithological research. Because of their worldwide distribution, diversity and adaptiveness, they have been studied extensively. The aim of the Corvids Literature Database (CLD, http://www.corvids.de/cld) is to record all publications (citation format) on all extant and extinct Crows, Ravens, Jays and Magpies worldwide and tag them with specific keywords making them available for researchers worldwide. The self-maintained project started in 2006 and today comprises 8000 articles, spanning almost 500 years. The CLD covers publications from 164 countries, written in 36 languages and published by 8026 authors in 1503 journals (plus books, theses and other publications). Forty-nine percent of all records are available online as full-text documents or deposited in the physical CLD archive. The CLD contains 442 original corvid descriptions. Here, we present a metadata assessment of articles recorded in the CLD including a gap analysis and prospects for future research. Database URL: http://www.corvids.de/cld PMID:26868053
Respiratory cancer database: An open access database of respiratory cancer gene and miRNA.
Choubey, Jyotsna; Choudhari, Jyoti Kant; Patel, Ashish; Verma, Mukesh Kumar
2017-01-01
Respiratory cancer database (RespCanDB) is a genomic and proteomic database of cancer of respiratory organ. It also includes the information of medicinal plants used for the treatment of various respiratory cancers with structure of its active constituents as well as pharmacological and chemical information of drug associated with various respiratory cancers. Data in RespCanDB has been manually collected from published research article and from other databases. Data has been integrated using MySQL an object-relational database management system. MySQL manages all data in the back-end and provides commands to retrieve and store the data into the database. The web interface of database has been built in ASP. RespCanDB is expected to contribute to the understanding of scientific community regarding respiratory cancer biology as well as developments of new way of diagnosing and treating respiratory cancer. Currently, the database consist the oncogenomic information of lung cancer, laryngeal cancer, and nasopharyngeal cancer. Data for other cancers, such as oral and tracheal cancers, will be added in the near future. The URL of RespCanDB is http://ridb.subdic-bioinformatics-nitrr.in/.
SEDIMENT DATA - COMMENCEMENT BAY HYLEBOS WATERWAY - TACOMA, WA - PRE-REMEDIAL DESIGN PROGRAM
Event 1A/1B Data Files URL address: http://www.epa.gov/r10earth/datalib/superfund/hybos1ab.htm. Sediment Chemistry Data (Database Format): HYBOS1AB.EXE is a self-extracting file which expands to the single-value per record .DBF format database file HYBOS1AB.DBF. This file contai...
SoyFN: a knowledge database of soybean functional networks.
Xu, Yungang; Guo, Maozu; Liu, Xiaoyan; Wang, Chunyu; Liu, Yang
2014-01-01
Many databases for soybean genomic analysis have been built and made publicly available, but few of them contain knowledge specifically targeting the omics-level gene-gene, gene-microRNA (miRNA) and miRNA-miRNA interactions. Here, we present SoyFN, a knowledge database of soybean functional gene networks and miRNA functional networks. SoyFN provides user-friendly interfaces to retrieve, visualize, analyze and download the functional networks of soybean genes and miRNAs. In addition, it incorporates much information about KEGG pathways, gene ontology annotations and 3'-UTR sequences as well as many useful tools including SoySearch, ID mapping, Genome Browser, eFP Browser and promoter motif scan. SoyFN is a schema-free database that can be accessed as a Web service from any modern programming language using a simple Hypertext Transfer Protocol call. The Web site is implemented in Java, JavaScript, PHP, HTML and Apache, with all major browsers supported. We anticipate that this database will be useful for members of research communities both in soybean experimental science and bioinformatics. Database URL: http://nclab.hit.edu.cn/SoyFN.
ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.
Zeng, Victor; Extavour, Cassandra G
2012-01-01
The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.
The Chinchilla Research Resource Database: resource for an otolaryngology disease model
Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.
2016-01-01
The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523
MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors
Muvva, Charuvaka; Patra, Sanjukta; Venkatesan, Subramanian
2016-01-01
Matrix metalloproteinases (MMPs) are a family of zinc-dependent proteinases involved in the regulation of the extracellular signaling and structural matrix environment of cells and tissues. MMPs are considered as promising targets for the treatment of many diseases. Therefore, creation of database on the inhibitors of MMP would definitely accelerate the research activities in this area due to its implication in above-mentioned diseases and associated limitations in the first and second generation inhibitors. In this communication, we report the development of a new MMpI database which provides resourceful information for all researchers working in this field. It is a web-accessible, unique resource that contains detailed information on the inhibitors of MMP including small molecules, peptides and MMP Drug Leads. The database contains entries of ~3000 inhibitors including ~72 MMP Drug Leads and ~73 peptide based inhibitors. This database provides the detailed molecular and structural details which are necessary for the drug discovery and development. The MMpI database contains physical properties, 2D and 3D structures (mol2 and pdb format files) of inhibitors of MMP. Other data fields are hyperlinked to PubChem, ChEMBL, BindingDB, DrugBank, PDB, MEROPS and PubMed. The database has extensive searching facility with MMpI ID, IUPAC name, chemical structure and with the title of research article. The MMP inhibitors provided in MMpI database are optimized using Python-based Hierarchical Environment for Integrated Xtallography (Phenix) software. MMpI Database is unique and it is the only public database that contains and provides the complete information on the inhibitors of MMP. Database URL: http://clri.res.in/subramanian/databases/mmpi/index.php. PMID:27509041
BioAcoustica: a free and open repository and analysis platform for bioacoustics
Baker, Edward; Price, Ben W.; Rycroft, S. D.; Smith, Vincent S.
2015-01-01
We describe an online open repository and analysis platform, BioAcoustica (http://bio.acousti.ca), for recordings of wildlife sounds. Recordings can be annotated using a crowdsourced approach, allowing voice introductions and sections with extraneous noise to be removed from analyses. This system is based on the Scratchpads virtual research environment, the BioVeL portal and the Taverna workflow management tool, which allows for analysis of recordings using a grid computing service. At present the analyses include spectrograms, oscillograms and dominant frequency analysis. Further analyses can be integrated to meet the needs of specific researchers or projects. Researchers can upload and annotate their recordings to supplement traditional publication. Database URL: http://bio.acousti.ca PMID:26055102
Building an efficient curation workflow for the Arabidopsis literature corpus
Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva
2012-01-01
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
GigaDB: promoting data dissemination and reproducibility
Sneddon, Tam P.; Si Zhe, Xiao; Edmunds, Scott C.; Li, Peter; Goodman, Laurie; Hunter, Christopher I.
2014-01-01
Often papers are published where the underlying data supporting the research are not made available because of the limitations of making such large data sets publicly and permanently accessible. Even if the raw data are deposited in public archives, the essential analysis intermediaries, scripts or software are frequently not made available, meaning the science is not reproducible. The GigaScience journal is attempting to address this issue with the associated data storage and dissemination portal, the GigaScience database (GigaDB). Here we present the current version of GigaDB and reveal plans for the next generation of improvements. However, most importantly, we are soliciting responses from you, the users, to ensure that future developments are focused on the data storage and dissemination issues that still need resolving. Database URL: http://www.gigadb.org PMID:24622612
iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence
Turner, Brian; Razick, Sabry; Turinsky, Andrei L.; Vlasblom, James; Crowdy, Edgard K.; Cho, Emerson; Morrison, Kyle; Wodak, Shoshana J.
2010-01-01
We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ PMID:20940177
BGDB: a database of bivalent genes
Li, Qingyan; Lian, Shuabin; Dai, Zhiming; Xiang, Qian; Dai, Xianhua
2013-01-01
Bivalent gene is a gene marked with both H3K4me3 and H3K27me3 epigenetic modification in the same area, and is proposed to play a pivotal role related to pluripotency in embryonic stem (ES) cells. Identification of these bivalent genes and understanding their functions are important for further research of lineage specification and embryo development. So far, lots of genome-wide histone modification data were generated in mouse and human ES cells. These valuable data make it possible to identify bivalent genes, but no comprehensive data repositories or analysis tools are available for bivalent genes currently. In this work, we develop BGDB, the database of bivalent genes. The database contains 6897 bivalent genes in human and mouse ES cells, which are manually collected from scientific literature. Each entry contains curated information, including genomic context, sequences, gene ontology and other relevant information. The web services of BGDB database were implemented with PHP + MySQL + JavaScript, and provide diverse query functions. Database URL: http://dailab.sysu.edu.cn/bgdb/ PMID:23894186
Drug-Path: a database for drug-induced pathways
Zeng, Hui; Cui, Qinghua
2015-01-01
Some databases for drug-associated pathways have been built and are publicly available. However, the pathways curated in most of these databases are drug-action or drug-metabolism pathways. In recent years, high-throughput technologies such as microarray and RNA-sequencing have produced lots of drug-induced gene expression profiles. Interestingly, drug-induced gene expression profile frequently show distinct patterns, indicating that drugs normally induce the activation or repression of distinct pathways. Therefore, these pathways contribute to study the mechanisms of drugs and drug-repurposing. Here, we present Drug-Path, a database of drug-induced pathways, which was generated by KEGG pathway enrichment analysis for drug-induced upregulated genes and downregulated genes based on drug-induced gene expression datasets in Connectivity Map. Drug-Path provides user-friendly interfaces to retrieve, visualize and download the drug-induced pathway data in the database. In addition, the genes deregulated by a given drug are highlighted in the pathways. All data were organized using SQLite. The web site was implemented using Django, a Python web framework. Finally, we believe that this database will be useful for related researches. Database URL: http://www.cuilab.cn/drugpath PMID:26130661
CREDO: a structural interactomics database for drug discovery
Schreyer, Adrian M.; Blundell, Tom L.
2013-01-01
CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL: http://www-cryst.bioc.cam.ac.uk/credo PMID:23868908
CBD: a biomarker database for colorectal cancer.
Zhang, Xueli; Sun, Xiao-Feng; Cao, Yang; Ye, Benchen; Peng, Qiliang; Liu, Xingyun; Shen, Bairong; Zhang, Hong
2018-01-01
Colorectal cancer (CRC) biomarker database (CBD) was established based on 870 identified CRC biomarkers and their relevant information from 1115 original articles in PubMed published from 1986 to 2017. In this version of the CBD, CRC biomarker data were collected, sorted, displayed and analysed. The CBD with the credible contents as a powerful and time-saving tool provide more comprehensive and accurate information for further CRC biomarker research. The CBD was constructed under MySQL server. HTML, PHP and JavaScript languages have been used to implement the web interface. The Apache was selected as HTTP server. All of these web operations were implemented under the Windows system. The CBD could provide to users the multiple individual biomarker information and categorized into the biological category, source and application of biomarkers; the experiment methods, results, authors and publication resources; the research region, the average age of cohort, gender, race, the number of tumours, tumour location and stage. We only collect data from the articles with clear and credible results to prove the biomarkers are useful in the diagnosis, treatment or prognosis of CRC. The CBD can also provide a professional platform to researchers who are interested in CRC research to communicate, exchange their research ideas and further design high-quality research in CRC. They can submit their new findings to our database via the submission page and communicate with us in the CBD.Database URL: http://sysbio.suda.edu.cn/CBD/.
CBD: a biomarker database for colorectal cancer
Zhang, Xueli; Sun, Xiao-Feng; Ye, Benchen; Peng, Qiliang; Liu, Xingyun; Shen, Bairong; Zhang, Hong
2018-01-01
Abstract Colorectal cancer (CRC) biomarker database (CBD) was established based on 870 identified CRC biomarkers and their relevant information from 1115 original articles in PubMed published from 1986 to 2017. In this version of the CBD, CRC biomarker data were collected, sorted, displayed and analysed. The CBD with the credible contents as a powerful and time-saving tool provide more comprehensive and accurate information for further CRC biomarker research. The CBD was constructed under MySQL server. HTML, PHP and JavaScript languages have been used to implement the web interface. The Apache was selected as HTTP server. All of these web operations were implemented under the Windows system. The CBD could provide to users the multiple individual biomarker information and categorized into the biological category, source and application of biomarkers; the experiment methods, results, authors and publication resources; the research region, the average age of cohort, gender, race, the number of tumours, tumour location and stage. We only collect data from the articles with clear and credible results to prove the biomarkers are useful in the diagnosis, treatment or prognosis of CRC. The CBD can also provide a professional platform to researchers who are interested in CRC research to communicate, exchange their research ideas and further design high-quality research in CRC. They can submit their new findings to our database via the submission page and communicate with us in the CBD. Database URL: http://sysbio.suda.edu.cn/CBD/ PMID:29846545
Curation accuracy of model organism databases
Keseler, Ingrid M.; Skrzypek, Marek; Weerasinghe, Deepika; Chen, Albert Y.; Fulcher, Carol; Li, Gene-Wei; Lemmer, Kimberly C.; Mladinich, Katherine M.; Chow, Edmond D.; Sherlock, Gavin; Karp, Peter D.
2014-01-01
Manual extraction of information from the biomedical literature—or biocuration—is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org// PMID:24923819
Criteria for Comparing Children's Web Search Tools.
ERIC Educational Resources Information Center
Kuntz, Jerry
1999-01-01
Presents criteria for evaluating and comparing Web search tools designed for children. Highlights include database size; accountability; categorization; search access methods; help files; spell check; URL searching; links to alternative search services; advertising; privacy policy; and layout and design. (LRW)
StraPep: a structure database of bioactive peptides
Wang, Jian; Yin, Tailang; Xiao, Xuwen; He, Dan; Xue, Zhidong; Jiang, Xinnong; Wang, Yan
2018-01-01
Abstract Bioactive peptides, with a variety of biological activities and wide distribution in nature, have attracted great research interest in biological and medical fields, especially in pharmaceutical industry. The structural information of bioactive peptide is important for the development of peptide-based drugs. Many databases have been developed cataloguing bioactive peptides. However, to our knowledge, database dedicated to collect all the bioactive peptides with known structure is not available yet. Thus, we developed StraPep, a structure database of bioactive peptides. StraPep holds 3791 bioactive peptide structures, which belong to 1312 unique bioactive peptide sequences. About 905 out of 1312 (68%) bioactive peptides in StraPep contain disulfide bonds, which is significantly higher than that (21%) of PDB. Interestingly, 150 out of 616 (24%) bioactive peptides with three or more disulfide bonds form a structural motif known as cystine knot, which confers considerable structural stability on proteins and is an attractive scaffold for drug design. Detailed information of each peptide, including the experimental structure, the location of disulfide bonds, secondary structure, classification, post-translational modification and so on, has been provided. A wide range of user-friendly tools, such as browsing, sequence and structure-based searching and so on, has been incorporated into StraPep. We hope that this database will be helpful for the research community. Database URL: http://isyslab.info/StraPep PMID:29688386
DOE Office of Scientific and Technical Information (OSTI.GOV)
MacKinnon, Robert J.
2015-10-26
Under the auspices of the International Atomic Energy Agency (IAEA), nationally developed underground research laboratories (URLs) and associated research institutions are being offered for use by other nations. These facilities form an Underground Research Facilities (URF) Network for training in and demonstration of waste disposal technologies and the sharing of knowledge and experience related to geologic repository development, research, and engineering. In order to achieve its objectives, the URF Network regularly sponsors workshops and training events related to the knowledge base that is transferable between existing URL programs and to nations with an interest in developing a new URL. Thismore » report describes the role of URLs in the context of a general timeline for repository development. This description includes identification of key phases and activities that contribute to repository development as a repository program evolves from an early research and development phase to later phases such as construction, operations, and closure. This information is cast in the form of a matrix with the entries in this matrix forming the basis of the URF Network roadmap that will be used to identify and plan future workshops and training events.« less
Pan European Phenological database (PEP725): a single point of access for European data
NASA Astrophysics Data System (ADS)
Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M.; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana
2018-06-01
The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via
Cserhati, Matyas F.; Pandey, Sanjit; Beaudoin, James J.; Baccaglini, Lorena; Guda, Chittibabu; Fox, Howard S.
2015-01-01
We herein present the National NeuroAIDS Tissue Consortium-Data Coordinating Center (NNTC-DCC) database, which is the only available database for neuroAIDS studies that contains data in an integrated, standardized form. This database has been created in conjunction with the NNTC, which provides human tissue and biofluid samples to individual researchers to conduct studies focused on neuroAIDS. The database contains experimental datasets from 1206 subjects for the following categories (which are further broken down into subcategories): gene expression, genotype, proteins, endo-exo-chemicals, morphometrics and other (miscellaneous) data. The database also contains a wide variety of downloadable data and metadata for 95 HIV-related studies covering 170 assays from 61 principal investigators. The data represent 76 tissue types, 25 measurement types, and 38 technology types, and reaches a total of 33 017 407 data points. We used the ISA platform to create the database and develop a searchable web interface for querying the data. A gene search tool is also available, which searches for NCBI GEO datasets associated with selected genes. The database is manually curated with many user-friendly features, and is cross-linked to the NCBI, HUGO and PubMed databases. A free registration is required for qualified users to access the database. Database URL: http://nntc-dcc.unmc.edu PMID:26228431
Recent improvements in the NASA technical report server
NASA Technical Reports Server (NTRS)
Maa, Ming-Hokng; Nelson, Michael L.
1995-01-01
The NASA Technical Report Server (NTRS), a World Wide Web (WWW) report distribution service, has been modified to allow parallel database queries, significantly decreasing user access time by an average factor of 2.3, access from clients behind firewalls and/or proxies which truncate excessively long Uniform Resource Locators (URL's), access to non-Wide Area Information Server (WAIS) databases, and compatibility with the Z39-50.3 protocol.
Ozyurt, Ibrahim Burak; Grethe, Jeffrey S; Martone, Maryann E; Bandrowski, Anita E
2016-01-01
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.
Ozyurt, Ibrahim Burak; Grethe, Jeffrey S.; Martone, Maryann E.; Bandrowski, Anita E.
2016-01-01
The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking. PMID:26730820
The Papillomavirus Episteme: a major update to the papillomavirus sequence database.
Van Doorslaer, Koenraad; Li, Zhiwen; Xirasagar, Sandhya; Maes, Piet; Kaminsky, David; Liou, David; Sun, Qiang; Kaur, Ramandeep; Huyen, Yentram; McBride, Alison A
2017-01-04
The Papillomavirus Episteme (PaVE) is a database of curated papillomavirus genomic sequences, accompanied by web-based sequence analysis tools. This update describes the addition of major new features. The papillomavirus genomes within PaVE have been further annotated, and now includes the major spliced mRNA transcripts. Viral genes and transcripts can be visualized on both linear and circular genome browsers. Evolutionary relationships among PaVE reference protein sequences can be analysed using multiple sequence alignments and phylogenetic trees. To assist in viral discovery, PaVE offers a typing tool; a simplified algorithm to determine whether a newly sequenced virus is novel. PaVE also now contains an image library containing gross clinical and histopathological images of papillomavirus infected lesions. Database URL: https://pave.niaid.nih.gov/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases
Buza, Teresia M.; Jack, Sherman W.; Kirunda, Halid; Khaitsa, Margaret L.; Lawrence, Mark L.; Pruett, Stephen; Peterson, Daniel G.
2015-01-01
There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu PMID:26581408
Engel, Stacia R.; Cherry, J. Michael
2013-01-01
The first completed eukaryotic genome sequence was that of the yeast Saccharomyces cerevisiae, and the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the original model organism database. SGD remains the authoritative community resource for the S. cerevisiae reference genome sequence and its annotation, and continues to provide comprehensive biological information correlated with S. cerevisiae genes and their products. A diverse set of yeast strains have been sequenced to explore commercial and laboratory applications, and a brief history of those strains is provided. The publication of these new genomes has motivated the creation of new tools, and SGD will annotate and provide comparative analyses of these sequences, correlating changes with variations in strain phenotypes and protein function. We are entering a new era at SGD, as we incorporate these new sequences and make them accessible to the scientific community, all in an effort to continue in our mission of educating researchers and facilitating discovery. Database URL: http://www.yeastgenome.org/ PMID:23487186
Reiser, Leonore; Berardini, Tanya Z; Li, Donghui; Muller, Robert; Strait, Emily M; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva
2016-01-01
Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org. © The Author(s) 2016. Published by Oxford University Press.
PharmDB-K: Integrated Bio-Pharmacological Network Database for Traditional Korean Medicine
Lee, Ji-Hyun; Park, Kyoung Mii; Han, Dong-Jin; Bang, Nam Young; Kim, Do-Hee; Na, Hyeongjin; Lim, Semi; Kim, Tae Bum; Kim, Dae Gyu; Kim, Hyun-Jung; Chung, Yeonseok; Sung, Sang Hyun; Surh, Young-Joon; Kim, Sunghoon; Han, Byung Woo
2015-01-01
Despite the growing attention given to Traditional Medicine (TM) worldwide, there is no well-known, publicly available, integrated bio-pharmacological Traditional Korean Medicine (TKM) database for researchers in drug discovery. In this study, we have constructed PharmDB-K, which offers comprehensive information relating to TKM-associated drugs (compound), disease indication, and protein relationships. To explore the underlying molecular interaction of TKM, we integrated fourteen different databases, six Pharmacopoeias, and literature, and established a massive bio-pharmacological network for TKM and experimentally validated some cases predicted from the PharmDB-K analyses. Currently, PharmDB-K contains information about 262 TKMs, 7,815 drugs, 3,721 diseases, 32,373 proteins, and 1,887 side effects. One of the unique sets of information in PharmDB-K includes 400 indicator compounds used for standardization of herbal medicine. Furthermore, we are operating PharmDB-K via phExplorer (a network visualization software) and BioMart (a data federation framework) for convenient search and analysis of the TKM network. Database URL: http://pharmdb-k.org, http://biomart.i-pharm.org. PMID:26555441
Berardini, Tanya Z.; Li, Donghui; Muller, Robert; Strait, Emily M.; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva
2016-01-01
Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org PMID:26989150
WATERSHED INFORMATION - SURF YOUR WATERSHED
Surf Your Watershed is both a database of urls to world wide web pages associated with the watershed approach of environmental management and also data sets of relevant environmental information that can be queried. It is designed for citizens and decision makers across the count...
The Histone Database: an integrated resource for histones and histone fold-containing proteins
Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David
2011-01-01
Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
Hayman, G Thomas; Laulederkind, Stanley J F; Smith, Jennifer R; Wang, Shur-Jen; Petri, Victoria; Nigam, Rajni; Tutaj, Marek; De Pons, Jeff; Dwinell, Melinda R; Shimoyama, Mary
2016-01-01
The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu. © The Author(s) 2016. Published by Oxford University Press.
Enhancing the DNA Patent Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walters, LeRoy B.
Final Report on Award No. DE-FG0201ER63171 Principal Investigator: LeRoy B. Walters February 18, 2008 This project successfully completed its goal of surveying and reporting on the DNA patenting and licensing policies at 30 major U.S. academic institutions. The report of survey results was published in the January 2006 issue of Nature Biotechnology under the title “The Licensing of DNA Patents by US Academic Institutions: An Empirical Survey.” Lori Pressman was the lead author on this feature article. A PDF reprint of the article will be submitted to our Program Officer under separate cover. The project team has continued to updatemore » the DNA Patent Database on a weekly basis since the conclusion of the project. The database can be accessed at dnapatents.georgetown.edu. This database provides a valuable research tool for academic researchers, policymakers, and citizens. A report entitled Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health was published in 2006 by the Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, Board on Science, Technology, and Economic Policy at the National Academies. The report was edited by Stephen A. Merrill and Anne-Marie Mazza. This report employed and then adapted the methodology developed by our research project and quoted our findings at several points. (The full report can be viewed online at the following URL: http://www.nap.edu/openbook.php?record_id=11487&page=R1). My colleagues and I are grateful for the research support of the ELSI program at the U.S. Department of Energy.« less
IsoPlot: a database for comparison of mRNA isoforms in fruit fly and mosquitoes
Ng, I-Man; Tsai, Shang-Chi
2017-01-01
Abstract Alternative splicing (AS), a mechanism by which different forms of mature messenger RNAs (mRNAs) are generated from the same gene, widely occurs in the metazoan genomes. Knowledge about isoform variants and abundance is crucial for understanding the functional context in the molecular diversity of the species. With increasing transcriptome data of model and non-model species, a database for visualization and comparison of AS events with up-to-date information is needed for further research. IsoPlot is a publicly available database with visualization tools for exploration of AS events, including three major species of mosquitoes, Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, and fruit fly Drosophila melanogaster, the model insect species. IsoPlot includes not only 88,663 annotated transcripts but also 17,037 newly predicted transcripts from massive transcriptome data at different developmental stages of mosquitoes. The web interface enables users to explore the patterns and abundance of isoforms in different experimental conditions as well as cross-species sequence comparison of orthologous transcripts. IsoPlot provides a platform for researchers to access comprehensive information about AS events in mosquitoes and fruit fly. Our database is available on the web via an interactive user interface with an intuitive graphical design, which is applicable for the comparison of complex isoforms within or between species. Database URL: http://isoplot.iis.sinica.edu.tw/ PMID:29220459
LocSigDB: a database of protein localization signals
Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu
2015-01-01
LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059
A comprehensive view of the web-resources related to sericulture
Singh, Deepika; Chetia, Hasnahana; Kabiraj, Debajyoti; Sharma, Swagata; Kumar, Anil; Sharma, Pragya; Deka, Manab; Bora, Utpal
2016-01-01
Recent progress in the field of sequencing and analysis has led to a tremendous spike in data and the development of data science tools. One of the outcomes of this scientific progress is development of numerous databases which are gaining popularity in all disciplines of biology including sericulture. As economically important organism, silkworms are studied extensively for their numerous applications in the field of textiles, biomaterials, biomimetics, etc. Similarly, host plants, pests, pathogens, etc. are also being probed to understand the seri-resources more efficiently. These studies have led to the generation of numerous seri-related databases which are extremely helpful for the scientific community. In this article, we have reviewed all the available online resources on silkworm and its related organisms, including databases as well as informative websites. We have studied their basic features and impact on research through citation count analysis, finally discussing the role of emerging sequencing and analysis technologies in the field of seri-data science. As an outcome of this review, a web portal named SeriPort, has been created which will act as an index for the various sericulture-related databases and web resources available in cyberspace. Database URL: http://www.seriport.in/ PMID:27307138
Astronomical Software Directory Service
NASA Technical Reports Server (NTRS)
Hanisch, R. J.; Payne, H.; Hayes, J.
1998-01-01
This is the final report on the development of the Astronomical Software Directory Service (ASDS), a distributable, searchable, WWW-based database of software packages and their related documentation. ASDS provides integrated access to 56 astronomical software packages, with more than 16,000 URL's indexed for full-text searching.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions.
Karr, Jonathan R; Phillips, Nolan C; Covert, Markus W
2014-01-01
Mechanistic 'whole-cell' models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. http://www.wholecellsimdb.org SOURCE CODE REPOSITORY: URL: http://github.com/CovertLab/WholeCellSimDB. © The Author(s) 2014. Published by Oxford University Press.
CTGA: the database for genetic disorders in Arab populations.
Tadmouri, Ghazi O; Al Ali, Mahmoud Taleb; Al-Haj Ali, Sarah; Al Khaja, Najib
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the 'Catalogue of Transmission Genetics in Arabs' (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (http://www.cags.org.ae).
CTGA: the database for genetic disorders in Arab populations
Tadmouri, Ghazi O.; Ali, Mahmoud Taleb Al; Ali, Sarah Al-Haj; Khaja, Najib Al
2006-01-01
The Arabs comprise a genetically heterogeneous group that resulted from the admixture of different populations throughout history. They share many common characteristics responsible for a considerable proportion of perinatal and neonatal mortalities. To this end, the Centre for Arab Genomic Studies (CAGS) launched a pilot project to construct the ‘Catalogue of Transmission Genetics in Arabs’ (CTGA) database for genetic disorders in Arabs. Information in CTGA is drawn from published research and mined hospital records. The database offers web-based basic and advanced search approaches. In either case, the final search result is a detailed HTML record that includes text-, URL- and graphic-based fields. At present, CTGA hosts entries for 692 phenotypes and 235 related genes described in Arab individuals. Of these, 213 phenotypic descriptions and 22 related genes were observed in the Arab population of the United Arab Emirates (UAE). These results emphasize the role of CTGA as an essential tool to promote scientific research on genetic disorders in the region. The priority of CTGA is to provide timely information on the occurrence of genetic disorders in Arab individuals. It is anticipated that data from Arab countries other than the UAE will be exhaustively searched and incorporated in CTGA (). PMID:16381941
OReFiL: an online resource finder for life sciences.
Yamamoto, Yasunori; Takagi, Toshihisa
2007-08-06
Many online resources for the life sciences have been developed and introduced in peer-reviewed papers recently, ranging from databases and web applications to data-analysis software. Some have been introduced in special journal issues or websites with a search function, but others remain scattered throughout the Internet and in the published literature. The searchable resources on these sites are collected and maintained manually and are therefore of higher quality than automatically updated sites, but also require more time and effort. We developed an online resource search system called OReFiL to address these issues. We developed a crawler to gather all of the web pages whose URLs appear in MEDLINE abstracts and full-text papers on the BioMed Central open-access journals. The URLs were extracted using regular expressions and rules based on our heuristic knowledge. We then indexed the online resources to facilitate their retrieval and comparison by researchers. Because every online resource has at least one PubMed ID, we can easily acquire its summary with Medical Subject Headings (MeSH) terms and confirm its credibility through reference to the corresponding PubMed entry. In addition, because OReFiL automatically extracts URLs and updates the index, minimal time and effort is needed to maintain the system. We developed OReFiL, a search system for online life science resources, which is freely available. The system's distinctive features include the ability to return up-to-date query-relevant online resources introduced in peer-reviewed papers; the ability to search using free words, MeSH terms, or author names; easy verification of each hit following links to the corresponding PubMed entry or to papers citing the URL through the search systems of BioMed Central, Scirus, HighWire Press, or Google Scholar; and quick confirmation of the existence of an online resource web page.
OReFiL: an online resource finder for life sciences
Yamamoto, Yasunori; Takagi, Toshihisa
2007-01-01
Background Many online resources for the life sciences have been developed and introduced in peer-reviewed papers recently, ranging from databases and web applications to data-analysis software. Some have been introduced in special journal issues or websites with a search function, but others remain scattered throughout the Internet and in the published literature. The searchable resources on these sites are collected and maintained manually and are therefore of higher quality than automatically updated sites, but also require more time and effort. Description We developed an online resource search system called OReFiL to address these issues. We developed a crawler to gather all of the web pages whose URLs appear in MEDLINE abstracts and full-text papers on the BioMed Central open-access journals. The URLs were extracted using regular expressions and rules based on our heuristic knowledge. We then indexed the online resources to facilitate their retrieval and comparison by researchers. Because every online resource has at least one PubMed ID, we can easily acquire its summary with Medical Subject Headings (MeSH) terms and confirm its credibility through reference to the corresponding PubMed entry. In addition, because OReFiL automatically extracts URLs and updates the index, minimal time and effort is needed to maintain the system. Conclusion We developed OReFiL, a search system for online life science resources, which is freely available. The system's distinctive features include the ability to return up-to-date query-relevant online resources introduced in peer-reviewed papers; the ability to search using free words, MeSH terms, or author names; easy verification of each hit following links to the corresponding PubMed entry or to papers citing the URL through the search systems of BioMed Central, Scirus, HighWire Press, or Google Scholar; and quick confirmation of the existence of an online resource web page. PMID:17683589
Citations to Web pages in scientific articles: the permanence of archived references.
Thorp, Andrea W; Schriger, David L
2011-02-01
We validate the use of archiving Internet references by comparing the accessibility of published uniform resource locators (URLs) with corresponding archived URLs over time. We scanned the "Articles in Press" section in Annals of Emergency Medicine from March 2009 through June 2010 for Internet references in research articles. If an Internet reference produced the authors' expected content, the Web page was archived with WebCite (http://www.webcitation.org). Because the archived Web page does not change, we compared it with the original URL to determine whether the original Web page had changed. We attempted to access each original URL and archived Web site URL at 3-month intervals from the time of online publication during an 18-month study period. Once a URL no longer existed or failed to contain the original authors' expected content, it was excluded from further study. The number of original URLs and archived URLs that remained accessible over time was totaled and compared. A total of 121 articles were reviewed and 144 Internet references were found within 55 articles. Of the original URLs, 15% (21/144; 95% confidence interval [CI] 9% to 21%) were inaccessible at publication. During the 18-month observation period, there was no loss of archived URLs (apart from the 4% [5/123; 95% CI 2% to 9%] that could not be archived), whereas 35% (49/139) of the original URLs were lost (46% loss; 95% CI 33% to 61% by the Kaplan-Meier method; difference between curves P<.0001, log rank test). Archiving a referenced Web page at publication can help preserve the authors' expected information. Copyright © 2010 American College of Emergency Physicians. Published by Mosby, Inc. All rights reserved.
Internet-Based Laboratory Activities Designed for Studying the Sun with Satellites
NASA Astrophysics Data System (ADS)
Slater, T. F.
1998-12-01
Yohkoh Public Outreach Project (YPOP) is a collaborative industry, university, and K-16 project bringing fascinating and dynamic images of the Sun to the public in real-time. Partners have developed an extensive public access and educational WWW site containing more than 100 pages of vibrant images with current information that focuses on movies of the X-ray output of our Sun taken by the Yohkoh Satellite. More than 5 Gb of images and movies are available on the WWW site from the Yohkoh satellite, a joint project of the Institute for Space and Astronautical Sciences (ISAS) and NASA. Using a movie theater motif, the site was created by teams working at Lockheed Martin Advanced Technology Center, Palo Alto, CA in the Solar and Astrophysics Research Group, the Montana State University Solar Physics Research Group, and the Montana State University Conceptual Astronomy and Physics Education Research Group with funding from the NASA Learning Technology Project (LTP) program (NASA LTP SK30G4410R). The Yohkoh Movie Theater Internet Site is found at URL: http://www.lmsal.com/YPOP/ and mirrored at URL: http://solar.physics.montana.edu/YPOP/. In addition to being able to request automated movies for any dates in a 5 Gb on-line database, the user can view automatically updated daily images and movies of our Sun over the last 72 hours. Master science teachers working with the NASA funded Yohkoh Public Outreach Project have developed nine technology-based on-line lessons for K-16 classrooms. These interdisciplinary science, mathematics, and technology lessons integrate Internet resources, real-time images of the Sun, and extensive NASA image databases. Instructors are able to freely access each of the classroom-ready activities. The activities require students to use scientific inquiry skills and manage electronic information to solve problems consistent with the emphasis of the NRC National Science Education Standards.
LymPHOS 2.0: an update of a phosphosite database of primary human T cells
Nguyen, Tien Dung; Vidal-Cortes, Oriol; Gallardo, Oscar; Abian, Joaquin; Carrascal, Montserrat
2015-01-01
LymPHOS is a web-oriented database containing peptide and protein sequences and spectrometric information on the phosphoproteome of primary human T-Lymphocytes. Current release 2.0 contains 15 566 phosphorylation sites from 8273 unique phosphopeptides and 4937 proteins, which correspond to a 45-fold increase over the original database description. It now includes quantitative data on phosphorylation changes after time-dependent treatment with activators of the TCR-mediated signal transduction pathway. Sequence data quality has also been improved with the use of multiple search engines for database searching. LymPHOS can be publicly accessed at http://www.lymphos.org. Database URL: http://www.lymphos.org. PMID:26708986
RefPrimeCouch—a reference gene primer CouchApp
Silbermann, Jascha; Wernicke, Catrin; Pospisil, Heike; Frohme, Marcus
2013-01-01
To support a quantitative real-time polymerase chain reaction standardization project, a new reference gene database application was required. The new database application was built with the explicit goal of simplifying not only the development process but also making the user interface more responsive and intuitive. To this end, CouchDB was used as the backend with a lightweight dynamic user interface implemented client-side as a one-page web application. Data entry and curation processes were streamlined using an OpenRefine-based workflow. The new RefPrimeCouch database application provides its data online under an Open Database License. Database URL: http://hpclife.th-wildau.de:5984/rpc/_design/rpc/view.html PMID:24368831
RefPrimeCouch--a reference gene primer CouchApp.
Silbermann, Jascha; Wernicke, Catrin; Pospisil, Heike; Frohme, Marcus
2013-01-01
To support a quantitative real-time polymerase chain reaction standardization project, a new reference gene database application was required. The new database application was built with the explicit goal of simplifying not only the development process but also making the user interface more responsive and intuitive. To this end, CouchDB was used as the backend with a lightweight dynamic user interface implemented client-side as a one-page web application. Data entry and curation processes were streamlined using an OpenRefine-based workflow. The new RefPrimeCouch database application provides its data online under an Open Database License. Database URL: http://hpclife.th-wildau.de:5984/rpc/_design/rpc/view.html.
CyanoClust: comparative genome resources of cyanobacteria and plastids.
Sasaki, Naobumi V; Sato, Naoki
2010-01-01
Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.
Optimizing the NASA Technical Report Server
NASA Technical Reports Server (NTRS)
Nelson, Michael L.; Maa, Ming-Hokng
1996-01-01
The NASA Technical Report Server (NTRS), a World Wide Web report distribution NASA technical publications service, is modified for performance enhancement, greater protocol support, and human interface optimization. Results include: Parallel database queries, significantly decreasing user access times by an average factor of 2.3; access from clients behind firewalls and/ or proxies which truncate excessively long Uniform Resource Locators (URLs); access to non-Wide Area Information Server (WAIS) databases and compatibility with the 239-50.3 protocol; and a streamlined user interface.
MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution
Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian
2015-01-01
Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. PMID:26286928
MicRhoDE: a curated database for the analysis of microbial rhodopsin diversity and evolution.
Boeuf, Dominique; Audic, Stéphane; Brillet-Guéguen, Loraine; Caron, Christophe; Jeanthon, Christian
2015-01-01
Microbial rhodopsins are a diverse group of photoactive transmembrane proteins found in all three domains of life and in viruses. Today, microbial rhodopsin research is a flourishing research field in which new understandings of rhodopsin diversity, function and evolution are contributing to broader microbiological and molecular knowledge. Here, we describe MicRhoDE, a comprehensive, high-quality and freely accessible database that facilitates analysis of the diversity and evolution of microbial rhodopsins. Rhodopsin sequences isolated from a vast array of marine and terrestrial environments were manually collected and curated. To each rhodopsin sequence are associated related metadata, including predicted spectral tuning of the protein, putative activity and function, taxonomy for sequences that can be linked to a 16S rRNA gene, sampling date and location, and supporting literature. The database currently covers 7857 aligned sequences from more than 450 environmental samples or organisms. Based on a robust phylogenetic analysis, we introduce an operational classification system with multiple phylogenetic levels ranging from superclusters to species-level operational taxonomic units. An integrated pipeline for online sequence alignment and phylogenetic tree construction is also provided. With a user-friendly interface and integrated online bioinformatics tools, this unique resource should be highly valuable for upcoming studies of the biogeography, diversity, distribution and evolution of microbial rhodopsins. Database URL: http://micrhode.sb-roscoff.fr. © The Author(s) 2015. Published by Oxford University Press.
Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies.
Huang, Yu S; Horton, Matthew; Vilhjálmsson, Bjarni J; Seren, Umit; Meng, Dazhe; Meyer, Christopher; Ali Amer, Muhammad; Borevitz, Justin O; Bergelson, Joy; Nordborg, Magnus
2011-01-01
With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/
Milc, Justyna; Sala, Antonio; Bergamaschi, Sonia; Pecchioni, Nicola
2011-01-01
The CEREALAB database aims to store genotypic and phenotypic data obtained by the CEREALAB project and to integrate them with already existing data sources in order to create a tool for plant breeders and geneticists. The database can help them in unravelling the genetics of economically important phenotypic traits; in identifying and choosing molecular markers associated to key traits; and in choosing the desired parentals for breeding programs. The database is divided into three sub-schemas corresponding to the species of interest: wheat, barley and rice; each sub-schema is then divided into two sub-ontologies, regarding genotypic and phenotypic data, respectively. Database URL: http://www.cerealab.unimore.it/jws/cerealab.jnlp PMID:21247929
Lu, Zhiyong
2012-01-01
Today’s biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/ PMID:23160414
Directly e-mailing authors of newly published papers encourages community curation
Bunt, Stephanie M.; Grumbling, Gary B.; Field, Helen I.; Marygold, Steven J.; Brown, Nicholas H.; Millburn, Gillian H.
2012-01-01
Much of the data within Model Organism Databases (MODs) comes from manual curation of the primary research literature. Given limited funding and an increasing density of published material, a significant challenge facing all MODs is how to efficiently and effectively prioritize the most relevant research papers for detailed curation. Here, we report recent improvements to the triaging process used by FlyBase. We describe an automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate (‘flag’) the types of data described in the paper using an online tool. Based on the author-assigned flags, papers are then prioritized for detailed curation and channelled to appropriate curator teams for full data extraction. The overall response rate has been 44% and the flagging of data types by authors is sufficiently accurate for effective prioritization of papers. In summary, we have established a sustainable community curation program, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction. Database URL: http://flybase.org/ PMID:22554788
The UniProtKB guide to the human proteome
Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole
2016-01-01
Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845
TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants
Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng
2014-01-01
The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure–activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein–ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. PMID:24930145
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases.
Buza, Teresia M; Jack, Sherman W; Kirunda, Halid; Khaitsa, Margaret L; Lawrence, Mark L; Pruett, Stephen; Peterson, Daniel G
2015-01-01
There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu. © The Author(s) 2015. Published by Oxford University Press.
Pierneef, Rian; Cronje, Louis; Bezuidt, Oliver; Reva, Oleg N.
2015-01-01
Abstract The Predicted Genomic Islands database (Pre_GI) is a comprehensive repository of prokaryotic genomic islands (islands, GIs) freely accessible at http://pregi.bi.up.ac.za/index.php . Pre_GI, Version 2015, catalogues 26 744 islands identified in 2407 bacterial/archaeal chromosomes and plasmids. It provides an easy-to-use interface which allows users the ability to query against the database with a variety of fields, parameters and associations. Pre_GI is constructed to be a web-resource for the analysis of ontological roads between islands and cartographic analysis of the global fluxes of mobile genetic elements through bacterial and archaeal taxonomic borders. Comparison of newly identified islands against Pre_GI presents an alternative avenue to identify their ontology, origin and relative time of acquisition. Pre_GI aims to aid research on horizontal transfer events and materials through providing data and tools for holistic investigation of migration of genes through ecological niches and taxonomic boundaries. Database URL: http://pregi.bi.up.ac.za/index.php , Version 2015 PMID:26200753
NASA Astrophysics Data System (ADS)
Jarboe, N.; Minnett, R.; Constable, C.; Koppers, A. A.; Tauxe, L.
2013-12-01
The Magnetics Information Consortium (MagIC) is dedicated to supporting the paleomagnetic, geomagnetic, and rock magnetic communities through the development and maintenance of an online database (http://earthref.org/MAGIC/), data upload and quality control, searches, data downloads, and visualization tools. While MagIC has completed importing some of the IAGA paleomagnetic databases (TRANS, PINT, PSVRL, GPMDB) and continues to import others (ARCHEO, MAGST and SECVR), further individual data uploading from the community contributes a wealth of easily-accessible rich datasets. Previously uploading of data to the MagIC database required the use of an Excel spreadsheet using either a Mac or PC. The new method of uploading data utilizes an HTML 5 web interface where the only computer requirement is a modern browser. This web interface will highlight all errors discovered in the dataset at once instead of the iterative error checking process found in the previous Excel spreadsheet data checker. As a web service, the community will always have easy access to the most up-to-date and bug free version of the data upload software. The filtering search mechanism of the MagIC database has been changed to a more intuitive system where the data from each contribution is displayed in tables similar to how the data is uploaded (http://earthref.org/MAGIC/search/). Searches themselves can be saved as a permanent URL, if desired. The saved search URL could then be used as a citation in a publication. When appropriate, plots (equal area, Zijderveld, ARAI, demagnetization, etc.) are associated with the data to give the user a quicker understanding of the underlying dataset. The MagIC database will continue to evolve to meet the needs of the paleomagnetic, geomagnetic, and rock magnetic communities.
Mouse IDGenes: a reference database for genetic interactions in the developing mouse brain
Matthes, Michaela; Preusse, Martin; Zhang, Jingzhong; Schechter, Julia; Mayer, Daniela; Lentes, Bernd; Theis, Fabian; Prakash, Nilima; Wurst, Wolfgang; Trümbach, Dietrich
2014-01-01
The study of developmental processes in the mouse and other vertebrates includes the understanding of patterning along the anterior–posterior, dorsal–ventral and medial– lateral axis. Specifically, neural development is also of great clinical relevance because several human neuropsychiatric disorders such as schizophrenia, autism disorders or drug addiction and also brain malformations are thought to have neurodevelopmental origins, i.e. pathogenesis initiates during childhood and adolescence. Impacts during early neurodevelopment might also predispose to late-onset neurodegenerative disorders, such as Parkinson’s disease. The neural tube develops from its precursor tissue, the neural plate, in a patterning process that is determined by compartmentalization into morphogenetic units, the action of local signaling centers and a well-defined and locally restricted expression of genes and their interactions. While public databases provide gene expression data with spatio-temporal resolution, they usually neglect the genetic interactions that govern neural development. Here, we introduce Mouse IDGenes, a reference database for genetic interactions in the developing mouse brain. The database is highly curated and offers detailed information about gene expressions and the genetic interactions at the developing mid-/hindbrain boundary. To showcase the predictive power of interaction data, we infer new Wnt/β-catenin target genes by machine learning and validate one of them experimentally. The database is updated regularly. Moreover, it can easily be extended by the research community. Mouse IDGenes will contribute as an important resource to the research on mouse brain development, not exclusively by offering data retrieval, but also by allowing data input. Database URL: http://mouseidgenes.helmholtz-muenchen.de. PMID:25145340
yStreX: yeast stress expression database
Wanichthanarak, Kwanjeera; Nookaew, Intawat; Petranovic, Dina
2014-01-01
Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platform that would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.com PMID:25024351
TMDB: a literature-curated database for small molecular compounds found from tea.
Yue, Yi; Chu, Gang-Xiu; Liu, Xue-Shi; Tang, Xing; Wang, Wei; Liu, Guang-Jin; Yang, Tao; Ling, Tie-Jun; Wang, Xiao-Gang; Zhang, Zheng-Zhu; Xia, Tao; Wan, Xiao-Chun; Bao, Guan-Hu
2014-09-16
Tea is one of the most consumed beverages worldwide. The healthy effects of tea are attributed to a wealthy of different chemical components from tea. Thousands of studies on the chemical constituents of tea had been reported. However, data from these individual reports have not been collected into a single database. The lack of a curated database of related information limits research in this field, and thus a cohesive database system should necessarily be constructed for data deposit and further application. The Tea Metabolome database (TMDB), a manually curated and web-accessible database, was developed to provide detailed, searchable descriptions of small molecular compounds found in Camellia spp. esp. in the plant Camellia sinensis and compounds in its manufactured products (different kinds of tea infusion). TMDB is currently the most complete and comprehensive curated collection of tea compounds data in the world. It contains records for more than 1393 constituents found in tea with information gathered from 364 published books, journal articles, and electronic databases. It also contains experimental 1H NMR and 13C NMR data collected from the purified reference compounds or collected from other database resources such as HMDB. TMDB interface allows users to retrieve tea compounds entries by keyword search using compound name, formula, occurrence, and CAS register number. Each entry in the TMDB contains an average of 24 separate data fields including its original plant species, compound structure, formula, molecular weight, name, CAS registry number, compound types, compound uses including healthy benefits, reference literatures, NMR, MS data, and the corresponding ID from databases such as HMDB and Pubmed. Users can also contribute novel regulatory entries by using a web-based submission page. The TMDB database is freely accessible from the URL of http://pcsb.ahau.edu.cn:8080/TCDB/index.jsp. The TMDB is designed to address the broad needs of tea biochemists, natural products chemists, nutritionists, and members of tea related research community. The TMDB database provides a solid platform for collection, standardization, and searching of compounds information found in tea. As such this database will be a comprehensive repository for tea biochemistry and tea health research community.
Schroedinger’s code: Source code availability and transparency in astrophysics
NASA Astrophysics Data System (ADS)
Ryan, PW; Allen, Alice; Teuben, Peter
2018-01-01
Astronomers use software for their research, but how many of the codes they use are available as source code? We examined a sample of 166 papers from 2015 for clearly identified software use, then searched for source code for the software packages mentioned in these research papers. We categorized the software to indicate whether source code is available for download and whether there are restrictions to accessing it, and if source code was not available, whether some other form of the software, such as a binary, was. Over 40% of the source code for the software used in our sample was not available for download.As URLs have often been used as proxy citations for software, we also extracted URLs from one journal’s 2015 research articles, removed those from certain long-term, reliable domains, and tested the remainder to determine what percentage of these URLs were still accessible in September and October, 2017.
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome
Sarika; Arora, Vasu; Iquebal, M. A.; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on ‘three-tier architecture’ that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers’ search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/ PMID:23396298
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.
Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/
DPTEdb, an integrative database of transposable elements in dioecious plants.
Li, Shu-Fen; Zhang, Guo-Jun; Zhang, Xue-Jin; Yuan, Jin-Hong; Deng, Chuan-Liang; Gu, Lian-Feng; Gao, Wu-Jun
2016-01-01
Dioecious plants usually harbor 'young' sex chromosomes, providing an opportunity to study the early stages of sex chromosome evolution. Transposable elements (TEs) are mobile DNA elements frequently found in plants and are suggested to play important roles in plant sex chromosome evolution. The genomes of several dioecious plants have been sequenced, offering an opportunity to annotate and mine the TE data. However, comprehensive and unified annotation of TEs in these dioecious plants is still lacking. In this study, we constructed a dioecious plant transposable element database (DPTEdb). DPTEdb is a specific, comprehensive and unified relational database and web interface. We used a combination of de novo, structure-based and homology-based approaches to identify TEs from the genome assemblies of previously published data, as well as our own. The database currently integrates eight dioecious plant species and a total of 31 340 TEs along with classification information. DPTEdb provides user-friendly web interfaces to browse, search and download the TE sequences in the database. Users can also use tools, including BLAST, GetORF, HMMER, Cut sequence and JBrowse, to analyze TE data. Given the role of TEs in plant sex chromosome evolution, the database will contribute to the investigation of TEs in structural, functional and evolutionary dynamics of the genome of dioecious plants. In addition, the database will supplement the research of sex diversification and sex chromosome evolution of dioecious plants.Database URL: http://genedenovoweb.ticp.net:81/DPTEdb/index.php. © The Author(s) 2016. Published by Oxford University Press.
miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs.
Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia
2015-01-01
In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or 'miRNA sponges' that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. © The Author(s) 2015. Published by Oxford University Press.
miRSponge: a manually curated database for experimentally supported miRNA sponges and ceRNAs
Wang, Peng; Zhi, Hui; Zhang, Yunpeng; Liu, Yue; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Ning, Shangwei; Li, Xia
2015-01-01
In this study, we describe miRSponge, a manually curated database, which aims at providing an experimentally supported resource for microRNA (miRNA) sponges. Recent evidence suggests that miRNAs are themselves regulated by competing endogenous RNAs (ceRNAs) or ‘miRNA sponges’ that contain miRNA binding sites. These competitive molecules can sequester miRNAs to prevent them interacting with their natural targets to play critical roles in various biological and pathological processes. It has become increasingly important to develop a high quality database to record and store ceRNA data to support future studies. To this end, we have established the experimentally supported miRSponge database that contains data on 599 miRNA-sponge interactions and 463 ceRNA relationships from 11 species following manual curating from nearly 1200 published articles. Database classes include endogenously generated molecules including coding genes, pseudogenes, long non-coding RNAs and circular RNAs, along with exogenously introduced molecules including viral RNAs and artificial engineered sponges. Approximately 70% of the interactions were identified experimentally in disease states. miRSponge provides a user-friendly interface for convenient browsing, retrieval and downloading of dataset. A submission page is also included to allow researchers to submit newly validated miRNA sponge data. Database URL: http://www.bio-bigdata.net/miRSponge. PMID:26424084
TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants.
Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng
2014-01-01
The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure-activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein-ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. © The Author(s) 2014. Published by Oxford University Press.
Névéol, Aurélie; Wilbur, W John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
Névéol, Aurélie; Wilbur, W. John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/ PMID:22685160
Divide and Recombine for Large Complex Data
2017-12-01
Empirical Methods in Natural Language Processing , October 2014 Keywords Enter keywords for the publication. URL Enter the URL...low-latency data processing systems. Declarative Languages for Interactive Visualization: The Reactive Vega Stack Another thread of XDATA research...for array processing operations embedded in the R programming language . Vector virtual machines work well for long vectors. One of the most
Fischer, Steve; Aurrecoechea, Cristina; Brunk, Brian P.; Gao, Xin; Harb, Omar S.; Kraemer, Eileen T.; Pennington, Cary; Treatman, Charles; Kissinger, Jessica C.; Roos, David S.; Stoeckert, Christian J.
2011-01-01
Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy’s flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at code.google.com/p/strategies-wdk. Database URL: www.eupathdb.org PMID:21705364
OliveNet™: a comprehensive library of compounds from Olea europaea
Bonvino, Natalie P; Liang, Julia; McCord, Elizabeth D; Zafiris, Elena; Benetti, Natalia; Ray, Nancy B; Hung, Andrew; Boskou, Dimitrios
2018-01-01
Abstract Accumulated epidemiological, clinical and experimental evidence has indicated the beneficial health effects of the Mediterranean diet, which is typified by the consumption of virgin olive oil (VOO) as a main source of dietary fat. At the cellular level, compounds derived from various olive (Olea europaea), matrices, have demonstrated potent antioxidant and anti-inflammatory effects, which are thought to account, at least in part, for their biological effects. Research efforts are expanding into the characterization of compounds derived from Olea europaea, however, the considerable diversity and complexity of the vast array of chemical compounds have made their precise identification and quantification challenging. As such, only a relatively small subset of olive-derived compounds has been explored for their biological activity and potential health effects to date. Although there is adequate information describing the identification or isolation of olive-derived compounds, these are not easily searchable, especially when attempting to acquire chemical or biological properties. Therefore, we have created the OliveNet™ database containing a comprehensive catalogue of compounds identified from matrices of the olive, including the fruit, leaf and VOO, as well as in the wastewater and pomace accrued during oil production. From a total of 752 compounds, chemical analysis was sufficient for 676 individual compounds, which have been included in the database. The database is curated and comprehensively referenced containing information for the 676 compounds, which are divided into 13 main classes and 47 subclasses. Importantly, with respect to current research trends, the database includes 222 olive phenolics, which are divided into 13 subclasses. To our knowledge, OliveNet™ is currently the only curated open access database with a comprehensive collection of compounds associated with Olea europaea. Database URL: https://www.mccordresearch.com.au PMID:29688352
Earthquake-induced ground failures in Italy from a reviewed database
NASA Astrophysics Data System (ADS)
Martino, S.; Prestininzi, A.; Romeo, R. W.
2013-05-01
A database (Italian acronym CEDIT) of earthquake-induced ground failures in Italy is presented, and the related content is analysed. The catalogue collects data regarding landslides, liquefaction, ground cracks, surface faulting and ground-level changes triggered by earthquakes of Mercalli intensity 8 or greater that occurred in the last millennium in Italy. As of January 2013, the CEDIT database has been available online for public use (URL: http://www.ceri.uniroma1.it/cn/index.do?id=230&page=55) and is presently hosted by the website of the Research Centre for Geological Risks (CERI) of the "Sapienza" University of Rome. Summary statistics of the database content indicate that 14% of the Italian municipalities have experienced at least one earthquake-induced ground failure and that landslides are the most common ground effects (approximately 45%), followed by ground cracks (32%) and liquefaction (18%). The relationships between ground effects and earthquake parameters such as seismic source energy (earthquake magnitude and epicentral intensity), local conditions (site intensity) and source-to-site distances are also analysed. The analysis indicates that liquefaction, surface faulting and ground-level changes are much more dependent on the earthquake source energy (i.e. magnitude) than landslides and ground cracks. In contrast, the latter effects are triggered at lower site intensities and greater epicentral distances than the other environmental effects.
Wiley, Laura K.; Sivley, R. Michael; Bush, William S.
2013-01-01
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist PMID:23894185
Wiley, Laura K; Sivley, R Michael; Bush, William S
2013-01-01
Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.
The NCBI BioCollections Database
Sharma, Shobha; Ciufo, Stacy; Starchenko, Elena; Darji, Dakshesh; Chlumsky, Larry; Karsch-Mizrachi, Ilene
2018-01-01
Abstract The rapidly growing set of GenBank submissions includes sequences that are derived from vouchered specimens. These are associated with culture collections, museums, herbaria and other natural history collections, both living and preserved. Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses. The National Center for Biotechnology Information BioCollections Database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows cross-linking from the home institution for quick identification of all records originating from each collection. Database URL: https://www.ncbi.nlm.nih.gov/biocollections PMID:29688360
PmiRExAt: plant miRNA expression atlas database and web applications
Gurjar, Anoop Kishor Singh; Panwar, Abhijeet Singh; Gupta, Rajinder; Mantri, Shrikant S.
2016-01-01
High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in. PMID:27081157
CoReCG: a comprehensive database of genes associated with colon-rectal cancer
Agarwal, Rahul; Kumar, Binayak; Jayadev, Msk; Raghav, Dhwani; Singh, Ashutosh
2016-01-01
Cancer of large intestine is commonly referred as colorectal cancer, which is also the third most frequently prevailing neoplasm across the globe. Though, much of work is being carried out to understand the mechanism of carcinogenesis and advancement of this disease but, fewer studies has been performed to collate the scattered information of alterations in tumorigenic cells like genes, mutations, expression changes, epigenetic alteration or post translation modification, genetic heterogeneity. Earlier findings were mostly focused on understanding etiology of colorectal carcinogenesis but less emphasis were given for the comprehensive review of the existing findings of individual studies which can provide better diagnostics based on the suggested markers in discrete studies. Colon Rectal Cancer Gene Database (CoReCG), contains 2056 colon-rectal cancer genes information involved in distinct colorectal cancer stages sourced from published literature with an effective knowledge based information retrieval system. Additionally, interactive web interface enriched with various browsing sections, augmented with advance search facility for querying the database is provided for user friendly browsing, online tools for sequence similarity searches and knowledge based schema ensures a researcher friendly information retrieval mechanism. Colorectal cancer gene database (CoReCG) is expected to be a single point source for identification of colorectal cancer-related genes, thereby helping with the improvement of classification, diagnosis and treatment of human cancers. Database URL: lms.snu.edu.in/corecg PMID:27114494
Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources.
Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu
2015-01-01
The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.
McIlroy, Simon Jon; Kirkegaard, Rasmus Hansen; McIlroy, Bianca; Nierychlo, Marta; Kristensen, Jannie Munk; Karst, Søren Michael; Albertsen, Mads
2017-01-01
Abstract Wastewater is increasingly viewed as a resource, with anaerobic digester technology being routinely implemented for biogas production. Characterising the microbial communities involved in wastewater treatment facilities and their anaerobic digesters is considered key to their optimal design and operation. Amplicon sequencing of the 16S rRNA gene allows high-throughput monitoring of these systems. The MiDAS field guide is a public resource providing amplicon sequencing protocols and an ecosystem-specific taxonomic database optimized for use with wastewater treatment facility samples. The curated taxonomy endeavours to provide a genus-level-classification for abundant phylotypes and the online field guide links this identity to published information regarding their ecology, function and distribution. This article describes the expansion of the database resources to cover the organisms of the anaerobic digester systems fed primary sludge and surplus activated sludge. The updated database includes descriptions of the abundant genus-level-taxa in influent wastewater, activated sludge and anaerobic digesters. Abundance information is also included to allow assessment of the role of emigration in the ecology of each phylotype. MiDAS is intended as a collaborative resource for the progression of research into the ecology of wastewater treatment, by providing a public repository for knowledge that is accessible to all interested in these biotechnologically important systems. Database URL: http://www.midasfieldguide.org PMID:28365734
ERIC Educational Resources Information Center
Hammond, Carol, Ed.
This document contains three papers presented at the 1995 Arizona Library Association conference. Papers include: (1) "ERLs and URLs: ASU Libraries Database Delivery Through Web Technology" (Dennis Brunning & Philip Konomos), which illustrates how and why the libraries at Arizona State University developed a world wide web server and…
MobiDB-lite: fast and highly specific consensus prediction of intrinsic disorder in proteins.
Necci, Marco; Piovesan, Damiano; Dosztányi, Zsuzsanna; Tosatto, Silvio C E
2017-05-01
Intrinsic disorder (ID) is established as an important feature of protein sequences. Its use in proteome annotation is however hampered by the availability of many methods with similar performance at the single residue level, which have mostly not been optimized to predict long ID regions of size comparable to domains. Here, we have focused on providing a single consensus-based prediction, MobiDB-lite, optimized for highly specific (i.e. few false positive) predictions of long disorder. The method uses eight different predictors to derive a consensus which is then filtered for spurious short predictions. Consensus prediction is shown to outperform the single methods when annotating long ID regions. MobiDB-lite can be useful in large-scale annotation scenarios and has indeed already been integrated in the MobiDB, DisProt and InterPro databases. MobiDB-lite is available as part of the MobiDB database from URL: http://mobidb.bio.unipd.it/. An executable can be downloaded from URL: http://protein.bio.unipd.it/mobidblite/. silvio.tosatto@unipd.it. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
CCDST: A free Canadian climate data scraping tool
NASA Astrophysics Data System (ADS)
Bonifacio, Charmaine; Barchyn, Thomas E.; Hugenholtz, Chris H.; Kienzle, Stefan W.
2015-02-01
In this paper we present a new software tool that automatically fetches, downloads and consolidates climate data from a Web database where the data are contained on multiple Web pages. The tool is called the Canadian Climate Data Scraping Tool (CCDST) and was developed to enhance access and simplify analysis of climate data from Canada's National Climate Data and Information Archive (NCDIA). The CCDST deconstructs a URL for a particular climate station in the NCDIA and then iteratively modifies the date parameters to download large volumes of data, remove individual file headers, and merge data files into one output file. This automated sequence enhances access to climate data by substantially reducing the time needed to manually download data from multiple Web pages. To this end, we present a case study of the temporal dynamics of blowing snow events that resulted in ~3.1 weeks time savings. Without the CCDST, the time involved in manually downloading climate data limits access and restrains researchers and students from exploring climate trends. The tool is coded as a Microsoft Excel macro and is available to researchers and students for free. The main concept and structure of the tool can be modified for other Web databases hosting geophysical data.
Kirmitzoglou, Ioannis; Promponas, Vasilis J
2015-07-01
Local compositionally biased and low complexity regions (LCRs) in amino acid sequences have initially attracted the interest of researchers due to their implication in generating artifacts in sequence database searches. There is accumulating evidence of the biological significance of LCRs both in physiological and in pathological situations. Nonetheless, LCR-related algorithms and tools have not gained wide appreciation across the research community, partly due to the fact that only a handful of user-friendly software is currently freely available. We developed LCR-eXXXplorer, an extensible online platform attempting to fill this gap. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Moreover, users may perform powerful queries against a custom designed sequence/LCR-centric database. We anticipate that LCR-eXXXplorer will be a useful starting point in research efforts for the elucidation of the structure, function and evolution of proteins with LCRs. LCR-eXXXplorer is freely available at the URL http://repeat.biol.ucy.ac.cy/lcr-exxxplorer. vprobon@ucy.ac.cy Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Can we replace curation with information extraction software?
Karp, Peter D
2016-01-01
Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL. © The Author(s) 2016. Published by Oxford University Press.
dbCPG: A web resource for cancer predisposition genes.
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-06-21
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.
HerDing: herb recommendation system to treat diseases using genes and chemicals
Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju
2016-01-01
In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding PMID:26980517
TISSUES 2.0: an integrative web resource on mammalian tissue expression
Palasca, Oana; Santos, Alberto; Stolte, Christian; Gorodkin, Jan; Jensen, Lars Juhl
2018-01-01
Abstract Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/ PMID:29617745
HerDing: herb recommendation system to treat diseases using genes and chemicals.
Choi, Wonjun; Choi, Chan-Hun; Kim, Young Ran; Kim, Seon-Jong; Na, Chang-Su; Lee, Hyunju
2016-01-01
In recent years, herbs have been researched for new drug candidates because they have a long empirical history of treating diseases and are relatively free from side effects. Studies to scientifically prove the medical efficacy of herbs for target diseases often spend a considerable amount of time and effort in choosing candidate herbs and in performing experiments to measure changes of marker genes when treating herbs. A computational approach to recommend herbs for treating diseases might be helpful to promote efficiency in the early stage of such studies. Although several databases related to traditional Chinese medicine have been already developed, there is no specialized Web tool yet recommending herbs to treat diseases based on disease-related genes. Therefore, we developed a novel search engine, HerDing, focused on retrieving candidate herb-related information with user search terms (a list of genes, a disease name, a chemical name or an herb name). HerDing was built by integrating public databases and by applying a text-mining method. The HerDing website is free and open to all users, and there is no login requirement. Database URL: http://combio.gist.ac.kr/herding. © The Author(s) 2016. Published by Oxford University Press.
Bhawna; Bonthala, V.S.; Gajula, MNV Prasad
2016-01-01
The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely. Database URL: http://www.multiomics.in/PvTFDB/ PMID:27465131
Saunders, Rebecca E; Instrell, Rachael; Rispoli, Rossella; Jiang, Ming; Howell, Michael
2013-01-01
High-throughput screening (HTS) uses technologies such as RNA interference to generate loss-of-function phenotypes on a genomic scale. As these technologies become more popular, many research institutes have established core facilities of expertise to deal with the challenges of large-scale HTS experiments. As the efforts of core facility screening projects come to fruition, focus has shifted towards managing the results of these experiments and making them available in a useful format that can be further mined for phenotypic discovery. The HTS-DB database provides a public view of data from screening projects undertaken by the HTS core facility at the CRUK London Research Institute. All projects and screens are described with comprehensive assay protocols, and datasets are provided with complete descriptions of analysis techniques. This format allows users to browse and search data from large-scale studies in an informative and intuitive way. It also provides a repository for additional measurements obtained from screens that were not the focus of the project, such as cell viability, and groups these data so that it can provide a gene-centric summary across several different cell lines and conditions. All datasets from our screens that can be made available can be viewed interactively and mined for further hit lists. We believe that in this format, the database provides researchers with rapid access to results of large-scale experiments that might facilitate their understanding of genes/compounds identified in their own research. DATABASE URL: http://hts.cancerresearchuk.org/db/public.
Photographs of the Sea floor Offshore of New York and New Jersey
Butman, Bradford; Gutierrez, Benjamin T.; Buchholtz ten Brink, Marilyn R.; Schwab, William S.; Blackwood, Dann S.; Mecray, Ellen L.; Middleton, Tammie J.
2003-01-01
This DVD-ROM contains photographs of the sea floor and sediment texture data collected as part of studies carried out by the U.S. Geological Survey (USGS) in the New York Bight (Figure 1a (PDF format)). The studies were designed to map the sea floor (Butman, 1998, URL: http://pubs.usgs.gov/fs/fs133-98/) and to develop an understanding of the transport and long-term fate of sediments and associated contaminants in the region (Mecray and others, 1999, URL: http://pubs.usgs.gov/fs/fs114-99/). The data were collected on four research cruises carried out between 1996 and 2000 (Appendix I). The images and texture data were collected to provide direct observations of the sea floor geology and to aid in the interpretation of backscatter intensity data obtained from sidescan sonar and multibeam surveys of the sea floor. Preliminary descriptions of the sea floor geology in this region may be found in Schwab and others (2000, URL: http://pubs.usgs.gov/of/of00-295/; 2003), Butman and others (1998, URL: http://pubs.usgs.gov/of/of98-616/.), and Butman and others (2002, URL: http://pubs.usgs.gov/of/of00-503/). Schwab and others (2000 URL: http://pubs.usgs.gov/of/of00-295/; 2003) have identified 11 geologic units in New York Bight (Figure 2 (PDF format)). These units identify areas of active sediment transport, extensive anthropogenic influence on the sea floor, and various geologic units. Butman and others (2003) and Harris and others (in press) present the results of a moored array experiment carried out in the Hudson Shelf Valley to investigate the transport of sediments during winter. Summaries of these and other studies may be found at USGS studies in the New York Bight (URL: http://woodshole.er.usgs.gov/project-pages/newyork/). This DVD-ROM contains digital images of bottom still photographs, images digitized from videos, sediment grain-size analysis results, and short QuickTime movies from video transects. The data are presented in tabular form and in an ESRI (Environmental Systems Research Institute, URL: http://www.esri.com) ArcView project where the image and sample locations may be viewed superimposed on maps showing side-scan sonar and/or multibeam backscatter intensity and bottom topography.
NALDB: nucleic acid ligand database for small molecules targeting nucleic acid
Kumar Mishra, Subodh; Kumar, Amit
2016-01-01
Nucleic acid ligand database (NALDB) is a unique database that provides detailed information about the experimental data of small molecules that were reported to target several types of nucleic acid structures. NALDB is the first ligand database that contains ligand information for all type of nucleic acid. NALDB contains more than 3500 ligand entries with detailed pharmacokinetic and pharmacodynamic information such as target name, target sequence, ligand 2D/3D structure, SMILES, molecular formula, molecular weight, net-formal charge, AlogP, number of rings, number of hydrogen bond donor and acceptor, potential energy along with their Ki, Kd, IC50 values. All these details at single platform would be helpful for the development and betterment of novel ligands targeting nucleic acids that could serve as a potential target in different diseases including cancers and neurological disorders. With maximum 255 conformers for each ligand entry, our database is a multi-conformer database and can facilitate the virtual screening process. NALDB provides powerful web-based search tools that make database searching efficient and simplified using option for text as well as for structure query. NALDB also provides multi-dimensional advanced search tool which can screen the database molecules on the basis of molecular properties of ligand provided by database users. A 3D structure visualization tool has also been included for 3D structure representation of ligands. NALDB offers an inclusive pharmacological information and the structurally flexible set of small molecules with their three-dimensional conformers that can accelerate the virtual screening and other modeling processes and eventually complement the nucleic acid-based drug discovery research. NALDB can be routinely updated and freely available on bsbe.iiti.ac.in/bsbe/naldb/HOME.php. Database URL: http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php PMID:26896846
NASA Astrophysics Data System (ADS)
Yabuuchi, Satoshi; Kunimaru, Takanori; Kishi, Atsuyasu; Komatsu, Mitsuru
Japan Atomic Energy Agency has been conducting the Horonobe Underground Research Laboratory (URL) project in Horonobe, Hokkaido, as a part of the research and development program on geological disposal of high-level radioactive waste. Pore water pressure and water content around a horizontal drift in the URL have been monitored for over 18 months since before the drift excavation was started. During the drift excavation, both pore water pressure and water content were decreasing. Pore water pressure has been still positive though it continued to decrease with its gradient gradually smaller after excavation, while water content turned to increase about 6 months after the completion of the excavation. It turned to fall again about 5 months later. An unsaturated zone containing gases which were dissolved in groundwater may have been formed around the horizontal drift.
The plant phenological online database (PPODB): an online database for long-term phenological data
NASA Astrophysics Data System (ADS)
Dierenbach, Jonas; Badeck, Franz-W.; Schaber, Jörg
2013-09-01
We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via
Liu, Zhi-Ping; Wu, Canglin; Miao, Hongyu; Wu, Hulin
2015-01-01
Transcriptional and post-transcriptional regulation of gene expression is of fundamental importance to numerous biological processes. Nowadays, an increasing amount of gene regulatory relationships have been documented in various databases and literature. However, to more efficiently exploit such knowledge for biomedical research and applications, it is necessary to construct a genome-wide regulatory network database to integrate the information on gene regulatory relationships that are widely scattered in many different places. Therefore, in this work, we build a knowledge-based database, named ‘RegNetwork’, of gene regulatory networks for human and mouse by collecting and integrating the documented regulatory interactions among transcription factors (TFs), microRNAs (miRNAs) and target genes from 25 selected databases. Moreover, we also inferred and incorporated potential regulatory relationships based on transcription factor binding site (TFBS) motifs into RegNetwork. As a result, RegNetwork contains a comprehensive set of experimentally observed or predicted transcriptional and post-transcriptional regulatory relationships, and the database framework is flexibly designed for potential extensions to include gene regulatory networks for other organisms in the future. Based on RegNetwork, we characterized the statistical and topological properties of genome-wide regulatory networks for human and mouse, we also extracted and interpreted simple yet important network motifs that involve the interplays between TF-miRNA and their targets. In summary, RegNetwork provides an integrated resource on the prior information for gene regulatory relationships, and it enables us to further investigate context-specific transcriptional and post-transcriptional regulatory interactions based on domain-specific experimental data. Database URL: http://www.regnetworkweb.org PMID:26424082
McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta
2016-01-01
BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. © The Author(s) 2016. Published by Oxford University Press.
An environmental database for Venice and tidal zones
NASA Astrophysics Data System (ADS)
Macaluso, L.; Fant, S.; Marani, A.; Scalvini, G.; Zane, O.
2003-04-01
The natural environment is a complex, highly variable and physically non reproducible system (not in laboratory, nor in a confined territory). Environmental experimental studies are thus necessarily based on field measurements distributed in time and space. Only extensive data collections can provide the representative samples of the system behavior which are essential for scientific advancement. The assimilation of large data collections into accessible archives must necessarily be implemented in electronic databases. In the case of tidal environments in general, and of the Venice lagoon in particular, it is useful to establish a database, freely accessible to the scientific community, documenting the dynamics of such systems and their response to anthropic pressures and climatic variability. At the Istituto Veneto di Scienze, Lettere ed Arti in Venice (Italy) two internet environmental databases has been developed: one collects information regarding in detail the Venice lagoon; the other co-ordinate the research consortium of the "TIDE" EU RTD project, that attends to three different tidal areas: Venice Lagoon (Italy), Morecambe Bay (England), and Forth Estuary (Scotland). The archives may be accessed through the URL: www.istitutoveneto.it. The first one is freely available and applies to anyone is interested. It is continuously updated and has been structured in order to promote documentation concerning Venetian environment and disseminate this information for educational purposes (see "Dissemination" section). The second one is supplied by scientists and engineers working on this tidal system for various purposes (scientific, management, conservation purposes, etc.); it applies to interested researchers and grows with their own contributions. Both intend to promote scientific communication, to contribute to the realization of a distributed information system collecting homogeneous themes, and to initiate the interconnection among databases regarding different kinds of environment.
Hyam, Roger; Hagedorn, Gregor; Chagnoux, Simon; Röpert, Dominik; Casino, Ana; Droege, Gabi; Glöckler, Falko; Gödderz, Karsten; Groom, Quentin; Hoffmann, Jana; Holleman, Ayco; Kempa, Matúš; Koivula, Hanna; Marhold, Karol; Nicolson, Nicky; Smith, Vincent S.; Triebel, Dagmar
2017-01-01
With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals. Database URL: http://cetaf.org/cetaf-stable-identifiers PMID:28365724
Web usage mining at an academic health sciences library: an exploratory study.
Bracke, Paul J
2004-10-01
This paper explores the potential of multinomial logistic regression analysis to perform Web usage mining for an academic health sciences library Website. Usage of database-driven resource gateway pages was logged for a six-month period, including information about users' network addresses, referring uniform resource locators (URLs), and types of resource accessed. It was found that referring URL did vary significantly by two factors: whether a user was on-campus and what type of resource was accessed. Although the data available for analysis are limited by the nature of the Web and concerns for privacy, this method demonstrates the potential for gaining insight into Web usage that supplements Web log analysis. It can be used to improve the design of static and dynamic Websites today and could be used in the design of more advanced Web systems in the future.
Das, Sankha Subhra; Saha, Pritam
2018-01-01
Abstract MicroRNAs (miRNAs) are well-known as key regulators of diverse biological pathways. A series of experimental evidences have shown that abnormal miRNA expression profiles are responsible for various pathophysiological conditions by modulating genes in disease associated pathways. In spite of the rapid increase in research data confirming such associations, scientists still do not have access to a consolidated database offering these miRNA-pathway association details for critical diseases. We have developed miRwayDB, a database providing comprehensive information of experimentally validated miRNA-pathway associations in various pathophysiological conditions utilizing data collected from published literature. To the best of our knowledge, it is the first database that provides information about experimentally validated miRNA mediated pathway dysregulation as seen specifically in critical human diseases and hence indicative of a cause-and-effect relationship in most cases. The current version of miRwayDB collects an exhaustive list of miRNA-pathway association entries for 76 critical disease conditions by reviewing 663 published articles. Each database entry contains complete information on the name of the pathophysiological condition, associated miRNA(s), experimental sample type(s), regulation pattern (up/down) of miRNA, pathway association(s), targeted member of dysregulated pathway(s) and a brief description. In addition, miRwayDB provides miRNA, gene and pathway score to evaluate the role of a miRNA regulated pathways in various pathophysiological conditions. The database can also be used for other biomedical approaches such as validation of computational analysis, integrated analysis and prediction of computational model. It also offers a submission page to submit novel data from recently published studies. We believe that miRwayDB will be a useful tool for miRNA research community. Database URL: http://www.mirway.iitkgp.ac.in PMID:29688364
Bhawna; Bonthala, V S; Gajula, Mnv Prasad
2016-01-01
The common bean [Phaseolus vulgaris (L.)] is one of the essential proteinaceous vegetables grown in developing countries. However, its production is challenged by low yields caused by numerous biotic and abiotic stress conditions. Regulatory transcription factors (TFs) symbolize a key component of the genome and are the most significant targets for producing stress tolerant crop and hence functional genomic studies of these TFs are important. Therefore, here we have constructed a web-accessible TFs database for P. vulgaris, called PvTFDB, which contains 2370 putative TF gene models in 49 TF families. This database provides a comprehensive information for each of the identified TF that includes sequence data, functional annotation, SSRs with their primer sets, protein physical properties, chromosomal location, phylogeny, tissue-specific gene expression data, orthologues, cis-regulatory elements and gene ontology (GO) assignment. Altogether, this information would be used in expediting the functional genomic studies of a specific TF(s) of interest. The objectives of this database are to understand functional genomics study of common bean TFs and recognize the regulatory mechanisms underlying various stress responses to ease breeding strategy for variety production through a couple of search interfaces including gene ID, functional annotation and browsing interfaces including by family and by chromosome. This database will also serve as a promising central repository for researchers as well as breeders who are working towards crop improvement of legume crops. In addition, this database provide the user unrestricted public access and the user can download entire data present in the database freely.Database URL: http://www.multiomics.in/PvTFDB/. © The Author(s) 2016. Published by Oxford University Press.
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
Green, Jason M.; Harnsomburana, Jaturon; Schaeffer, Mary L.; Lawrence, Carolyn J.; Shyu, Chi-Ren
2011-01-01
Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases. Database URL: http:www.PhenomicsWorld.org/QBTA.php PMID:21558151
Allie: a database and a search service of abbreviations and long forms.
Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa
2011-01-01
Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader's expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/.
Fujimura, Tomomi; Umemura, Hiroyuki
2018-01-15
The present study describes the development and validation of a facial expression database comprising five different horizontal face angles in dynamic and static presentations. The database includes twelve expression types portrayed by eight Japanese models. This database was inspired by the dimensional and categorical model of emotions: surprise, fear, sadness, anger with open mouth, anger with closed mouth, disgust with open mouth, disgust with closed mouth, excitement, happiness, relaxation, sleepiness, and neutral (static only). The expressions were validated using emotion classification and Affect Grid rating tasks [Russell, Weiss, & Mendelsohn, 1989. Affect Grid: A single-item scale of pleasure and arousal. Journal of Personality and Social Psychology, 57(3), 493-502]. The results indicate that most of the expressions were recognised as the intended emotions and could systematically represent affective valence and arousal. Furthermore, face angle and facial motion information influenced emotion classification and valence and arousal ratings. Our database will be available online at the following URL. https://www.dh.aist.go.jp/database/face2017/ .
Martin, Tiphaine; Sherman, David J; Durrens, Pascal
2011-01-01
The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul
2016-01-01
New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907
Nakagawa, So; Takahashi, Mahoko Ueda
2016-01-01
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.
Nakagawa, So; Takahashi, Mahoko Ueda
2016-01-01
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033
Standardized description of scientific evidence using the Evidence Ontology (ECO)
Chibucos, Marcus C.; Mungall, Christopher J.; Balakrishnan, Rama; Christie, Karen R.; Huntley, Rachael P.; White, Owen; Blake, Judith A.; Lewis, Suzanna E.; Giglio, Michelle
2014-01-01
The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org PMID:25052702
NASA Astrophysics Data System (ADS)
Sharma, Om Prakash; Kumar, Muthuvel Suresh
2016-01-01
Lymphatic filariasis (Lf) is one of the oldest and most debilitating tropical diseases. Millions of people are suffering from this prevalent disease. It is estimated to infect over 120 million people in at least 80 nations of the world through the tropical and subtropical regions. More than one billion people are in danger of getting affected with this life-threatening disease. Several studies were suggested its emerging limitations and resistance towards the available drugs and therapeutic targets for Lf. Therefore, better medicine and drug targets are in demand. We took an initiative to identify the essential proteins of Wolbachia endosymbiont of Brugia malayi, which are indispensable for their survival and non-homologous to human host proteins. In this current study, we have used proteome subtractive approach to screen the possible therapeutic targets for wBm. In addition, numerous literatures were mined in the hunt for potential drug targets, drugs, epitopes, crystal structures, and expressed sequence tag (EST) sequences for filarial causing nematodes. Data obtained from our study were presented in a user friendly database named FiloBase. We hope that information stored in this database may be used for further research and drug development process against filariasis. URL: http://filobase.bicpu.edu.in.
Predicting structured metadata from unstructured metadata
Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2016-01-01
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/ PMID:28637268
dbCPG: A web resource for cancer predisposition genes
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-01-01
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes. PMID:27192119
SilkPathDB: a comprehensive resource for the study of silkworm pathogens
Pan, Guo-Qing; Vossbrinck, Charles R.; Xu, Jin-Shan; Li, Chun-Feng; Chen, Jie; Long, Meng-Xian; Yang, Ming; Xu, Xiao-Fei; Xu, Chen; Debrunner-Vossbrinck, Bettina A.
2017-01-01
Silkworm pathogens have been heavily impeding the development of sericultural industry and play important roles in lepidopteran ecology, and some of which are used as biological insecticides. Rapid advances in studies on the omics of silkworm pathogens have produced a large amount of data, which need to be brought together centrally in a coherent and systematic manner. This will facilitate the reuse of these data for further analysis. We have collected genomic data for 86 silkworm pathogens from 4 taxa (fungi, microsporidia, bacteria and viruses) and from 4 lepidopteran hosts, and developed the open-access Silkworm Pathogen Database (SilkPathDB) to make this information readily available. The implementation of SilkPathDB involves integrating Drupal and GBrowse as a graphic interface for a Chado relational database which houses all of the datasets involved. The genomes have been assembled and annotated for comparative purposes and allow the search and analysis of homologous sequences, transposable elements, protein subcellular locations, including secreted proteins, and gene ontology. We believe that the SilkPathDB will aid researchers in the identification of silkworm parasites, understanding the mechanisms of silkworm infections, and the developmental ecology of silkworm parasites (gene expression) and their hosts. Database URL: http://silkpathdb.swu.edu.cn PMID:28365723
CADB: Conformation Angles DataBase of proteins
Sheik, S. S.; Ananthalakshmi, P.; Bhargavi, G. Ramya; Sekar, K.
2003-01-01
Conformation Angles DataBase (CADB) provides an online resource to access data on conformation angles (both main-chain and side-chain) of protein structures in two data sets corresponding to 25% and 90% sequence identity between any two proteins, available in the Protein Data Bank. In addition, the database contains the necessary crystallographic parameters. The package has several flexible options and display facilities to visualize the main-chain and side-chain conformation angles for a particular amino acid residue. The package can also be used to study the interrelationship between the main-chain and side-chain conformation angles. A web based JAVA graphics interface has been deployed to display the user interested information on the client machine. The database is being updated at regular intervals and can be accessed over the World Wide Web interface at the following URL: http://144.16.71.148/cadb/. PMID:12520049
Using STOQS and stoqstoolbox for in situ Measurement Data Access in Matlab
NASA Astrophysics Data System (ADS)
López-Castejón, F.; Schlining, B.; McCann, M. P.
2012-12-01
This poster presents the stoqstoolbox, an extension to Matlab that simplifies the loading of in situ measurement data directly from STOQS databases. STOQS (Spatial Temporal Oceanographic Query System) is a geospatial database tool designed to provide efficient access to data following the CF-NetCDF Discrete Samples Geometries convention. Data are loaded from CF-NetCDF files into a STOQS database where indexes are created on depth, spatial coordinates and other parameters, e.g. platform type. STOQS provides consistent, simple and efficient methods to query for data. For example, we can request all measurements with a standard_name of sea_water_temperature between two times and from between two depths. Data access is simpler because the data are retrieved by parameter irrespective of platform or mission file names. Access is more efficient because data are retrieved via the index on depth and only the requested data are retrieved from the database and transferred into the Matlab workspace. Applications in the stoqstoolbox query the STOQS database via an HTTP REST application programming interface; they follow the Data Access Object pattern, enabling highly customizable query construction. Data are loaded into Matlab structures that clearly indicate latitude, longitude, depth, measurement data value, and platform name. The stoqstoolbox is designed to be used in concert with other tools, such as nctoolbox, which can load data from any OPeNDAP data source. With these two toolboxes a user can easily work with in situ and other gridded data, such as from numerical models and remote sensing platforms. In order to show the capability of stoqstoolbox we will show an example of model validation using data collected during the May-June 2012 field experiment conducted by the Monterey Bay Aquarium Research Institute (MBARI) in Monterey Bay, California. The data are available from the STOQS server at http://odss.mbari.org/canon/stoqs_may2012/query/. Over 14 million data points of 18 parameters from 6 platforms measured over a 3-week period are available on this server. The model used for comparison is the Regional Ocean Modeling System developed by Jet Propulsion Laboratory for the Monterey Bay. The model output are loaded into Matlab using nctoolbox from the JPL server at http://ourocean.jpl.nasa.gov:8080/thredds/dodsC/MBNowcast. Model validation with in situ measurements can be difficult because of different file formats and because data may be spread across individual data systems for each platform. With stoqstoolbox the researcher must know only the URL of the STOQS server and the OPeNDAP URL of the model output. With selected depth and time constraints a user's Matlab program searches for all in situ measurements available for the same time, depth and variable of the model. STOQS and stoqstoolbox are open source software projects supported by MBARI and the David and Lucile Packard foundation. For more information please see http://code.google.com/p/stoqs.
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.
Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp
2015-01-01
Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer's disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html. © The Author(s) 2015. Published by Oxford University Press.
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases
Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin
2015-01-01
Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer’s disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html PMID:26475471
Kumari, Sangita; Pundhir, Sachin; Priya, Piyush; Jeena, Ganga; Punetha, Ankita; Chawla, Konika; Firdos Jafaree, Zohra; Mondal, Subhasish; Yadav, Gitanjali
2014-01-01
Plant essential oils are complex mixtures of volatile organic compounds, which play indispensable roles in the environment, for the plant itself, as well as for humans. The potential biological information stored in essential oil composition data can provide an insight into the silent language of plants, and the roles of these chemical emissions in defense, communication and pollinator attraction. In order to decipher volatile profile patterns from a global perspective, we have developed the ESSential OIL DataBase (EssOilDB), a continually updated, freely available electronic database designed to provide knowledge resource for plant essential oils, that enables one to address a multitude of queries on volatile profiles of native, invasive, normal or stressed plants, across taxonomic clades, geographical locations and several other biotic and abiotic influences. To our knowledge, EssOilDB is the only database in the public domain providing an opportunity for context based scientific research on volatile patterns in plants. EssOilDB presently contains 123 041 essential oil records spanning a century of published reports on volatile profiles, with data from 92 plant taxonomic families, spread across diverse geographical locations all over the globe. We hope that this huge repository of VOCs will facilitate unraveling of the true significance of volatiles in plants, along with creating potential avenues for industrial applications of essential oils. We also illustrate the use of this database in terpene biology and show how EssOilDB can be used to complement data from computational genomics to gain insights into the diversity and variability of terpenoids in the plant kingdom. EssOilDB would serve as a valuable information resource, for students and researchers in plant biology, in the design and discovery of new odor profiles, as well as for entrepreneurs—the potential for generating consumer specific scents being one of the most attractive and interesting topics in the cosmetic industry. Database URL: http://nipgr.res.in/Essoildb/ PMID:25534749
The Halophile protein database.
Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj
2014-01-01
Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.
ProDaMa: an open source Python library to generate protein structure datasets.
Armano, Giuliano; Manconi, Andrea
2009-10-02
The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Johnson, Kate; Church, Stan
2006-01-01
The following talk was an invited presentation given at the National Association of Abandoned Mine Lands Programs meeting in Billings, Montana on Sept. 25, 2006. The objective of the talk was to outline the scope of the U.S. Geological Survey research, past, present and future, in the area of abandoned mine research. Two large Professional Papers have come out of our AML studies: Nimick, D.A., Church, S.E., and Finger, S.E., eds., 2004, Integrated investigations of environmental effects of historical mining in the Basin and Boulder mining districts, Boulder River watershed, Jefferson County, Montana: U.S. Geological Survey Professional Paper 1652, 524 p., 2 plates, 1 DVD, URL: http://pubs.er.usgs.gov/usgspubs/pp/pp1652 Church, S.E., von Guerard, Paul, and Finger, S.E., eds., 2006, Integrated Investigations of Environmental Effects of Historical Mining in the Animas River Watershed, San Juan County, Colorado: U.S. Geological Survey Professional Paper 1651, 1,096 p., 6 plates, 1 DVD (in press). Additional publications and links can be found on the USGS AML website at URL: http://amli.usgs.gov/ or are accessible from the USGS Mineral Resource Program website at URL: http://minerals.usgs.gov/.
What’s in a URL? Genre Classification from URLs
2012-01-01
webpages with access to the content of a document and feature extraction from URLs alone. Feature Extraction from Webpages Stylistic and structural...2010). Character n-grams (sequence of n characters) are attractive because of their simplicity and because they encapsulate both lexical and stylistic ...report might be stylistic . Feature Extraction from URLs The syntactic characteristics of URLs have been fairly sta- ble over the years. URL terms are
Manasa, Justen; Lessells, Richard; Rossouw, Theresa; Naidu, Kevindra; Van Vuuren, Cloete; Goedhals, Dominique; van Zyl, Gert; Bester, Armand; Skingsley, Andrew; Stott, Katharine; Danaviah, Siva; Chetty, Terusha; Singh, Lavanya; Moodley, Pravi; Iwuji, Collins; McGrath, Nuala; Seebregts, Christopher J.; de Oliveira, Tulio
2014-01-01
Abstract Substantial amounts of data have been generated from patient management and academic exercises designed to better understand the human immunodeficiency virus (HIV) epidemic and design interventions to control it. A number of specialized databases have been designed to manage huge data sets from HIV cohort, vaccine, host genomic and drug resistance studies. Besides databases from cohort studies, most of the online databases contain limited curated data and are thus sequence repositories. HIV drug resistance has been shown to have a great potential to derail the progress made thus far through antiretroviral therapy. Thus, a lot of resources have been invested in generating drug resistance data for patient management and surveillance purposes. Unfortunately, most of the data currently available relate to subtype B even though >60% of the epidemic is caused by HIV-1 subtype C. A consortium of clinicians, scientists, public health experts and policy markers working in southern Africa came together and formed a network, the Southern African Treatment and Resistance Network (SATuRN), with the aim of increasing curated HIV-1 subtype C and tuberculosis drug resistance data. This article describes the HIV-1 data curation process using the SATuRN Rega database. The data curation is a manual and time-consuming process done by clinical, laboratory and data curation specialists. Access to the highly curated data sets is through applications that are reviewed by the SATuRN executive committee. Examples of research outputs from the analysis of the curated data include trends in the level of transmitted drug resistance in South Africa, analysis of the levels of acquired resistance among patients failing therapy and factors associated with the absence of genotypic evidence of drug resistance among patients failing therapy. All these studies have been important for informing first- and second-line therapy. This database is a free password-protected open source database available on www.bioafrica.net. Database URL: http://www.bioafrica.net/regadb/ PMID:24504151
NALDB: nucleic acid ligand database for small molecules targeting nucleic acid.
Kumar Mishra, Subodh; Kumar, Amit
2016-01-01
Nucleic acid ligand database (NALDB) is a unique database that provides detailed information about the experimental data of small molecules that were reported to target several types of nucleic acid structures. NALDB is the first ligand database that contains ligand information for all type of nucleic acid. NALDB contains more than 3500 ligand entries with detailed pharmacokinetic and pharmacodynamic information such as target name, target sequence, ligand 2D/3D structure, SMILES, molecular formula, molecular weight, net-formal charge, AlogP, number of rings, number of hydrogen bond donor and acceptor, potential energy along with their Ki, Kd, IC50 values. All these details at single platform would be helpful for the development and betterment of novel ligands targeting nucleic acids that could serve as a potential target in different diseases including cancers and neurological disorders. With maximum 255 conformers for each ligand entry, our database is a multi-conformer database and can facilitate the virtual screening process. NALDB provides powerful web-based search tools that make database searching efficient and simplified using option for text as well as for structure query. NALDB also provides multi-dimensional advanced search tool which can screen the database molecules on the basis of molecular properties of ligand provided by database users. A 3D structure visualization tool has also been included for 3D structure representation of ligands. NALDB offers an inclusive pharmacological information and the structurally flexible set of small molecules with their three-dimensional conformers that can accelerate the virtual screening and other modeling processes and eventually complement the nucleic acid-based drug discovery research. NALDB can be routinely updated and freely available on bsbe.iiti.ac.in/bsbe/naldb/HOME.php. Database URL: http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php. © The Author(s) 2016. Published by Oxford University Press.
Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)
Weber, Griffin; Mendis, Michael; Gainer, Vivian; Chueh, Henry C; Churchill, Susanne; Kohane, Isaac
2010-01-01
Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases (“data marts”) can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software. PMID:20190053
Integrating diverse databases into an unified analysis framework: a Galaxy approach
Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton
2011-01-01
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983
@font-face { font-family: 'DroidSansRegular'; src: url('../fonts/droidsans-webfont.eot'); src: url -family: 'DroidSansBold'; src: url('../fonts/droidsans-bold-webfont.eot'); src: url('../fonts/droidsans
Braun, Bremen L.; Schott, David A.; Portwood, II, John L.; Schaeffer, Mary L.; Harper, Lisa C.; Gardiner, Jack M.; Cannon, Ethalinda K.; Andorf, Carson M.
2017-01-01
Abstract The Maize Genetics and Genomics Database (MaizeGDB) team prepared a survey to identify breeders’ needs for visualizing pedigrees, diversity data and haplotypes in order to prioritize tool development and curation efforts at MaizeGDB. The survey was distributed to the maize research community on behalf of the Maize Genetics Executive Committee in Summer 2015. The survey garnered 48 responses from maize researchers, of which more than half were self-identified as breeders. The survey showed that the maize researchers considered their top priorities for visualization as: (i) displaying single nucleotide polymorphisms in a given region for a given list of lines, (ii) showing haplotypes for a given list of lines and (iii) presenting pedigree relationships visually. The survey also asked which populations would be most useful to display. The following two populations were on top of the list: (i) 3000 publicly available maize inbred lines used in Romay et al. (Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol, 2013;14:R55) and (ii) maize lines with expired Plant Variety Protection Act (ex-PVP) certificates. Driven by this strong stakeholder input, MaizeGDB staff are currently working in four areas to improve its interface and web-based tools: (i) presenting immediate progenies of currently available stocks at the MaizeGDB Stock pages, (ii) displaying the most recent ex-PVP lines described in the Germplasm Resources Information Network (GRIN) on the MaizeGDB Stock pages, (iii) developing network views of pedigree relationships and (iv) visualizing genotypes from SNP-based diversity datasets. These survey results can help other biological databases to direct their efforts according to user preferences as they serve similar types of data sets for their communities. Database URL: https://www.maizegdb.org PMID:28605768
Allie: a database and a search service of abbreviations and long forms
Yamamoto, Yasunori; Yamaguchi, Atsuko; Bono, Hidemasa; Takagi, Toshihisa
2011-01-01
Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced, Allie has an automatic update system. In addition, the database of abbreviations and their long forms with their corresponding PubMed IDs is constructed and updated weekly. Database URL: The Allie service is available at http://allie.dbcls.jp/. PMID:21498548
Bookshelf: a simple curation system for the storage of biomolecular simulation data.
Vohra, Shabana; Hall, Benjamin A; Holdbrook, Daniel A; Khalid, Syma; Biggin, Philip C
2010-01-01
Molecular dynamics simulations can now routinely generate data sets of several hundreds of gigabytes in size. The ability to generate this data has become easier over recent years and the rate of data production is likely to increase rapidly in the near future. One major problem associated with this vast amount of data is how to store it in a way that it can be easily retrieved at a later date. The obvious answer to this problem is a database. However, a key issue in the development and maintenance of such a database is its sustainability, which in turn depends on the ease of the deposition and retrieval process. Encouraging users to care about meta-data is difficult and thus the success of any storage system will ultimately depend on how well used by end-users the system is. In this respect we suggest that even a minimal amount of metadata if stored in a sensible fashion is useful, if only at the level of individual research groups. We discuss here, a simple database system which we call 'Bookshelf', that uses python in conjunction with a mysql database to provide an extremely simple system for curating and keeping track of molecular simulation data. It provides a user-friendly, scriptable solution to the common problem amongst biomolecular simulation laboratories; the storage, logging and subsequent retrieval of large numbers of simulations. Download URL: http://sbcb.bioch.ox.ac.uk/bookshelf/
Bookshelf: a simple curation system for the storage of biomolecular simulation data
Vohra, Shabana; Hall, Benjamin A.; Holdbrook, Daniel A.; Khalid, Syma; Biggin, Philip C.
2010-01-01
Molecular dynamics simulations can now routinely generate data sets of several hundreds of gigabytes in size. The ability to generate this data has become easier over recent years and the rate of data production is likely to increase rapidly in the near future. One major problem associated with this vast amount of data is how to store it in a way that it can be easily retrieved at a later date. The obvious answer to this problem is a database. However, a key issue in the development and maintenance of such a database is its sustainability, which in turn depends on the ease of the deposition and retrieval process. Encouraging users to care about meta-data is difficult and thus the success of any storage system will ultimately depend on how well used by end-users the system is. In this respect we suggest that even a minimal amount of metadata if stored in a sensible fashion is useful, if only at the level of individual research groups. We discuss here, a simple database system which we call ‘Bookshelf’, that uses python in conjunction with a mysql database to provide an extremely simple system for curating and keeping track of molecular simulation data. It provides a user-friendly, scriptable solution to the common problem amongst biomolecular simulation laboratories; the storage, logging and subsequent retrieval of large numbers of simulations. Download URL: http://sbcb.bioch.ox.ac.uk/bookshelf/ PMID:21169341
Xu, Yanjun; Yang, Haixiu; Wu, Tan; Dong, Qun; Sun, Zeguo; Shang, Desi; Li, Feng; Xu, Yingqi; Su, Fei; Liu, Siyao
2017-01-01
Abstract BioM2MetDisease is a manually curated database that aims to provide a comprehensive and experimentally supported resource of associations between metabolic diseases and various biomolecules. Recently, metabolic diseases such as diabetes have become one of the leading threats to people’s health. Metabolic disease associated with alterations of multiple types of biomolecules such as miRNAs and metabolites. An integrated and high-quality data source that collection of metabolic disease associated biomolecules is essential for exploring the underlying molecular mechanisms and discovering novel therapeutics. Here, we developed the BioM2MetDisease database, which currently documents 2681 entries of relationships between 1147 biomolecules (miRNAs, metabolites and small molecules/drugs) and 78 metabolic diseases across 14 species. Each entry includes biomolecule category, species, biomolecule name, disease name, dysregulation pattern, experimental technique, a brief description of metabolic disease-biomolecule relationships, the reference, additional annotation information etc. BioM2MetDisease provides a user-friendly interface to explore and retrieve all data conveniently. A submission page was also offered for researchers to submit new associations between biomolecules and metabolic diseases. BioM2MetDisease provides a comprehensive resource for studying biology molecules act in metabolic diseases, and it is helpful for understanding the molecular mechanisms and developing novel therapeutics for metabolic diseases. Database URL: http://www.bio-bigdata.com/BioM2MetDisease/ PMID:28605773
The annotation-enriched non-redundant patent sequence databases.
Li, Weizhong; Kondratowicz, Bartosz; McWilliam, Hamish; Nauche, Stephane; Lopez, Rodrigo
2013-01-01
The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases. Database URL: http://www.ebi.ac.uk/patentdata/nr/
The Annotation-enriched non-redundant patent sequence databases
Li, Weizhong; Kondratowicz, Bartosz; McWilliam, Hamish; Nauche, Stephane; Lopez, Rodrigo
2013-01-01
The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases. Database URL: http://www.ebi.ac.uk/patentdata/nr/ PMID:23396323
FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events.
Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J P; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok
2015-01-01
Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain-domain interactions, protein-protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist's mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop 'novel' therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE. © The Author(s) 2015. Published by Oxford University Press.
FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events
Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J. P.; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok
2015-01-01
Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain–domain interactions, protein–protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist’s mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop ‘novel’ therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE PMID:26384373
Phenol-Explorer: an online comprehensive database on polyphenol contents in foods.
Neveu, V; Perez-Jiménez, J; Vos, F; Crespy, V; du Chaffaut, L; Mennen, L; Knox, C; Eisner, R; Cruz, J; Wishart, D; Scalbert, A
2010-01-01
A number of databases on the plant metabolome describe the chemistry and biosynthesis of plant chemicals. However, no such database is specifically focused on foods and more precisely on polyphenols, one of the major classes of phytochemicals. As antioxidants, polyphenols influence human health and may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, some cancers or type 2 diabetes. To determine polyphenol intake in populations and study their association with health, it is essential to have detailed information on their content in foods. However this information is not easily collected due to the variety of their chemical structures and the variability of their content in a given food. Phenol-Explorer is the first comprehensive web-based database on polyphenol content in foods. It contains more than 37,000 original data points collected from 638 scientific articles published in peer-reviewed journals. The quality of these data has been evaluated before they were aggregated to produce final representative mean content values for 502 polyphenols in 452 foods. The web interface allows making various queries on the aggregated data to identify foods containing a given polyphenol or polyphenols present in a given food. For each mean content value, it is possible to trace all original content values and their literature sources. Phenol-Explorer is a major step forward in the development of databases on food constituents and the food metabolome. It should help researchers to better understand the role of phytochemicals in the technical and nutritional quality of food, and food manufacturers to develop tailor-made healthy foods. Database URL: http://www.phenol-explorer.eu.
Phenol-Explorer: an online comprehensive database on polyphenol contents in foods
Neveu, V.; Perez-Jiménez, J.; Vos, F.; Crespy, V.; du Chaffaut, L.; Mennen, L.; Knox, C.; Eisner, R.; Cruz, J.; Wishart, D.; Scalbert, A.
2010-01-01
A number of databases on the plant metabolome describe the chemistry and biosynthesis of plant chemicals. However, no such database is specifically focused on foods and more precisely on polyphenols, one of the major classes of phytochemicals. As antoxidants, polyphenols influence human health and may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, some cancers or type 2 diabetes. To determine polyphenol intake in populations and study their association with health, it is essential to have detailed information on their content in foods. However this information is not easily collected due to the variety of their chemical structures and the variability of their content in a given food. Phenol-Explorer is the first comprehensive web-based database on polyphenol content in foods. It contains more than 37 000 original data points collected from 638 scientific articles published in peer-reviewed journals. The quality of these data has been evaluated before they were aggregated to produce final representative mean content values for 502 polyphenols in 452 foods. The web interface allows making various queries on the aggregated data to identify foods containing a given polyphenol or polyphenols present in a given food. For each mean content value, it is possible to trace all original content values and their literature sources. Phenol-Explorer is a major step forward in the development of databases on food constituents and the food metabolome. It should help researchers to better understand the role of phytochemicals in the technical and nutritional quality of food, and food manufacturers to develop tailor-made healthy foods. Database URL: http://www.phenol-explorer.eu PMID:20428313
GBIS: the information system of the German Genebank
Oppermann, Markus; Weise, Stephan; Dittmann, Claudia; Knüpffer, Helmut
2015-01-01
The German Federal ex situ Genebank of Agricultural and Horticultural Crop Species is the largest collection of its kind in the countries of the European Union and amongst the 10 largest collections worldwide. Beside its enormous scientific value as a safeguard of plant biodiversity, the plant genetic resources maintained are also of high importance for breeders to provide new impulses. The complex processes of managing such a collection are supported by the Genebank Information System (GBIS). GBIS is an important source of information for researchers and plant breeders, e.g. for identifying appropriate germplasm for breeding purposes. In addition, the access to genebank material as a sovereign task is also of high interest to the general public. Moreover, GBIS acts as a data source for global information systems, such as the Global Biodiversity Information Facility (GBIF) or the European Search Catalogue for Plant Genetic Resources (EURISCO). Database URL: http://gbis.ipk-gatersleben.de/ PMID:25953079
HPIDB 2.0: a curated database for host–pathogen interactions
Ammari, Mais G.; Gresham, Cathy R.; McCarthy, Fiona M.; Nanduri, Bindu
2016-01-01
Identification and analysis of host–pathogen interactions (HPI) is essential to study infectious diseases. However, HPI data are sparse in existing molecular interaction databases, especially for agricultural host–pathogen systems. Therefore, resources that annotate, predict and display the HPI that underpin infectious diseases are critical for developing novel intervention strategies. HPIDB 2.0 (http://www.agbase.msstate.edu/hpi/main.html) is a resource for HPI data, and contains 45, 238 manually curated entries in the current release. Since the first description of the database in 2010, multiple enhancements to HPIDB data and interface services were made that are described here. Notably, HPIDB 2.0 now provides targeted biocuration of molecular interaction data. As a member of the International Molecular Exchange consortium, annotations provided by HPIDB 2.0 curators meet community standards to provide detailed contextual experimental information and facilitate data sharing. Moreover, HPIDB 2.0 provides access to rapidly available community annotations that capture minimum molecular interaction information to address immediate researcher needs for HPI network analysis. In addition to curation, HPIDB 2.0 integrates HPI from existing external sources and contains tools to infer additional HPI where annotated data are scarce. Compared to other interaction databases, our data collection approach ensures HPIDB 2.0 users access the most comprehensive HPI data from a wide range of pathogens and their hosts (594 pathogen and 70 host species, as of February 2016). Improvements also include enhanced search capacity, addition of Gene Ontology functional information, and implementation of network visualization. The changes made to HPIDB 2.0 content and interface ensure that users, especially agricultural researchers, are able to easily access and analyse high quality, comprehensive HPI data. All HPIDB 2.0 data are updated regularly, are publically available for direct download, and are disseminated to other molecular interaction resources. Database URL: http://www.agbase.msstate.edu/hpi/main.html PMID:27374121
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
FirebrowseR: an R client to the Broad Institute’s Firehose Pipeline
Deng, Mario; Brägelmann, Johannes; Kryukov, Ivan; Saraiva-Agostinho, Nuno; Perner, Sven
2017-01-01
With its Firebrowse service (http://firebrowse.org/) the Broad Institute is making large-scale multi-platform omics data analysis results publicly available through a Representational State Transfer (REST) Application Programmable Interface (API). Querying this database through an API client from an arbitrary programming environment is an essential task, allowing other developers and researchers to focus on their analysis and avoid data wrangling. Hence, as a first result, we developed a workflow to automatically generate, test and deploy such clients for rapid response to API changes. Its underlying infrastructure, a combination of free and publicly available web services, facilitates the development of API clients. It decouples changes in server software from the client software by reacting to changes in the RESTful service and removing direct dependencies on a specific implementation of an API. As a second result, FirebrowseR, an R client to the Broad Institute’s RESTful Firehose Pipeline, is provided as a working example, which is built by the means of the presented workflow. The package’s features are demonstrated by an example analysis of cancer gene expression data. Database URL: https://github.com/mariodeng/ PMID:28062517
Rice SNP-seek database update: new SNPs, indels, and queries.
Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai
2017-01-04
We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The care pathway: concepts and theories: an introduction.
Schrijvers, Guus; van Hoorn, Arjan; Huiskes, Nicolette
2012-01-01
This article addresses first the definition of a (care) pathway, and then follows a description of theories since the 1950s. It ends with a discussion of theoretical advantages and disadvantages of care pathways for patients and professionals. The objective of this paper is to provide a theoretical base for empirical studies on care pathways. The knowledge for this chapter is based on several books on pathways, which we found by searching in the digital encyclopedia Wikipedia. Although this is not usual in scientific publications, this method was used because books are not searchable by databases as Pubmed. From 2005, we performed a literature search on Pubmed and other literature databases, and with the keywords integrated care pathway, clinical pathway, critical pathway, theory, research, and evaluation. One of the inspirational sources was the website of the European Pathway Association (EPA) and its journal International Journal of Care Pathways. The authors visited several sites for this paper. These are mentioned as illustration of a concept or theory. Most of them have English websites with more information. The URLs of these websites are not mentioned in this paper as a reference, because the content of them changes fast, sometimes every day.
The care pathway: concepts and theories: an introduction
Schrijvers, Guus; van Hoorn, Arjan; Huiskes, Nicolette
2012-01-01
This article addresses first the definition of a (care) pathway, and then follows a description of theories since the 1950s. It ends with a discussion of theoretical advantages and disadvantages of care pathways for patients and professionals. The objective of this paper is to provide a theoretical base for empirical studies on care pathways. The knowledge for this chapter is based on several books on pathways, which we found by searching in the digital encyclopedia Wikipedia. Although this is not usual in scientific publications, this method was used because books are not searchable by databases as Pubmed. From 2005, we performed a literature search on Pubmed and other literature databases, and with the keywords integrated care pathway, clinical pathway, critical pathway, theory, research, and evaluation. One of the inspirational sources was the website of the European Pathway Association (EPA) and its journal International Journal of Care Pathways. The authors visited several sites for this paper. These are mentioned as illustration of a concept or theory. Most of them have English websites with more information. The URLs of these websites are not mentioned in this paper as a reference, because the content of them changes fast, sometimes every day. PMID:23593066
FirebrowseR: an R client to the Broad Institute's Firehose Pipeline.
Deng, Mario; Brägelmann, Johannes; Kryukov, Ivan; Saraiva-Agostinho, Nuno; Perner, Sven
2017-01-01
With its Firebrowse service (http://firebrowse.org/) the Broad Institute is making large-scale multi-platform omics data analysis results publicly available through a Representational State Transfer (REST) Application Programmable Interface (API). Querying this database through an API client from an arbitrary programming environment is an essential task, allowing other developers and researchers to focus on their analysis and avoid data wrangling. Hence, as a first result, we developed a workflow to automatically generate, test and deploy such clients for rapid response to API changes. Its underlying infrastructure, a combination of free and publicly available web services, facilitates the development of API clients. It decouples changes in server software from the client software by reacting to changes in the RESTful service and removing direct dependencies on a specific implementation of an API. As a second result, FirebrowseR, an R client to the Broad Institute's RESTful Firehose Pipeline, is provided as a working example, which is built by the means of the presented workflow. The package's features are demonstrated by an example analysis of cancer gene expression data.Database URL: https://github.com/mariodeng/. © The Author(s) 2017. Published by Oxford University Press.
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2017-04-01
Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .
NASA Astrophysics Data System (ADS)
Tsvetkov, M. K.; Stavrev, K. Y.; Tsvetkova, K. P.; Semkov, E. H.; Mutatov, A. S.
The Wide-Field Plate Database (WFPDB) and the possibilities for its application as a research tool in observational astronomy are presented. Currently the WFPDB comprises the descriptive data for 400 000 archival wide field photographic plates obtained with 77 instruments, from a total of 1 850 000 photographs stored in 269 astronomical archives all over the world since the end of last century. The WFPDB is already accessible for the astronomical community, now only in batch mode through user requests sent by e-mail. We are working on on-line interactive access to the data via INTERNET from Sofia and parallel from the Centre de Donnees Astronomiques de Strasbourg. (Initial information can be found on World Wide Web homepage URL http://www.wfpa.acad.bg.) The WFPDB may be useful in studies of a variety of astronomical objects and phenomena, andespecially for long-term investigations of variable objects and for multi-wavelength research. We have analysed the data in the WFPDB in order to derive the overall characteristics of the totality of wide-field observations, such as the sky coverage, the distributions by observation time and date, by spectral band, and by object type. We have also examined the totality of wide-field observations from point of view of their quality, availability and digitisation. The usefulness of the WFPDB is demonstrated by the results of identification and investigation of the photometrical behaviour of optical analogues of gamma-ray bursts.
URS DataBase: universe of RNA structures and their motifs.
Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail
2016-01-01
The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA-protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification.Database URL: http://server3.lpm.org.ru/urs/. © The Author(s) 2016. Published by Oxford University Press.
URS DataBase: universe of RNA structures and their motifs
Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail
2016-01-01
The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA–protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification. Database URL: http://server3.lpm.org.ru/urs/ PMID:27242032
TRedD—A database for tandem repeats over the edit distance
Sokol, Dina; Atagun, Firat
2010-01-01
A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL: http://tandem.sci.brooklyn.cuny.edu/ PMID:20624712
DDRprot: a database of DNA damage response-related proteins.
Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M
2016-01-01
The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
dbHiMo: a web-based epigenomics platform for histone-modifying enzymes
Choi, Jaeyoung; Kim, Ki-Tae; Huh, Aram; Kwon, Seomun; Hong, Changyoung; Asiegbu, Fred O.; Jeon, Junhyun; Lee, Yong-Hwan
2015-01-01
Over the past two decades, epigenetics has evolved into a key concept for understanding regulation of gene expression. Among many epigenetic mechanisms, covalent modifications such as acetylation and methylation of lysine residues on core histones emerged as a major mechanism in epigenetic regulation. Here, we present the database for histone-modifying enzymes (dbHiMo; http://hme.riceblast.snu.ac.kr/) aimed at facilitating functional and comparative analysis of histone-modifying enzymes (HMEs). HMEs were identified by applying a search pipeline built upon profile hidden Markov model (HMM) to proteomes. The database incorporates 11 576 HMEs identified from 603 proteomes including 483 fungal, 32 plants and 51 metazoan species. The dbHiMo provides users with web-based personalized data browsing and analysis tools, supporting comparative and evolutionary genomics. With comprehensive data entries and associated web-based tools, our database will be a valuable resource for future epigenetics/epigenomics studies. Database URL: http://hme.riceblast.snu.ac.kr/ PMID:26055100
GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species
Kumar, Sujai; Stevens, Lewis; Blaxter, Mark
2017-01-01
Abstract As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org PMID:28605774
CGDSNPdb: a database resource for error-checked and imputed mouse SNPs.
Hutchins, Lucie N; Ding, Yueming; Szatkiewicz, Jin P; Von Smith, Randy; Yang, Hyuna; de Villena, Fernando Pardo-Manuel; Churchill, Gary A; Graber, Joel H
2010-07-06
The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/
NONATObase: a database for Polychaeta (Annelida) from the Southwestern Atlantic Ocean.
Pagliosa, Paulo R; Doria, João G; Misturini, Dairana; Otegui, Mariana B P; Oortman, Mariana S; Weis, Wilson A; Faroni-Perez, Larisse; Alves, Alexandre P; Camargo, Maurício G; Amaral, A Cecília Z; Marques, Antonio C; Lana, Paulo C
2014-01-01
Networks can greatly advance data sharing attitudes by providing organized and useful data sets on marine biodiversity in a friendly and shared scientific environment. NONATObase, the interactive database on polychaetes presented herein, will provide new macroecological and taxonomic insights of the Southwestern Atlantic region. The database was developed by the NONATO network, a team of South American researchers, who integrated available information on polychaetes from between 5°N and 80°S in the Atlantic Ocean and near the Antarctic. The guiding principle of the database is to keep free and open access to data based on partnerships. Its architecture consists of a relational database integrated in the MySQL and PHP framework. Its web application allows access to the data from three different directions: species (qualitative data), abundance (quantitative data) and data set (reference data). The database has built-in functionality, such as the filter of data on user-defined taxonomic levels, characteristics of site, sample, sampler, and mesh size used. Considering that there are still many taxonomic issues related to poorly known regional fauna, a scientific committee was created to work out consistent solutions to current misidentifications and equivocal taxonomy status of some species. Expertise from this committee will be incorporated by NONATObase continually. The use of quantitative data was possible by standardization of a sample unit. All data, maps of distribution and references from a data set or a specified query can be visualized and exported to a commonly used data format in statistical analysis or reference manager software. The NONATO network has initialized with NONATObase, a valuable resource for marine ecologists and taxonomists. The database is expected to grow in functionality as it comes in useful, particularly regarding the challenges of dealing with molecular genetic data and tools to assess the effects of global environment change. Database URL: http://nonatobase.ufsc.br/.
NONATObase: a database for Polychaeta (Annelida) from the Southwestern Atlantic Ocean
Pagliosa, Paulo R.; Doria, João G.; Misturini, Dairana; Otegui, Mariana B. P.; Oortman, Mariana S.; Weis, Wilson A.; Faroni-Perez, Larisse; Alves, Alexandre P.; Camargo, Maurício G.; Amaral, A. Cecília Z.; Marques, Antonio C.; Lana, Paulo C.
2014-01-01
Networks can greatly advance data sharing attitudes by providing organized and useful data sets on marine biodiversity in a friendly and shared scientific environment. NONATObase, the interactive database on polychaetes presented herein, will provide new macroecological and taxonomic insights of the Southwestern Atlantic region. The database was developed by the NONATO network, a team of South American researchers, who integrated available information on polychaetes from between 5°N and 80°S in the Atlantic Ocean and near the Antarctic. The guiding principle of the database is to keep free and open access to data based on partnerships. Its architecture consists of a relational database integrated in the MySQL and PHP framework. Its web application allows access to the data from three different directions: species (qualitative data), abundance (quantitative data) and data set (reference data). The database has built-in functionality, such as the filter of data on user-defined taxonomic levels, characteristics of site, sample, sampler, and mesh size used. Considering that there are still many taxonomic issues related to poorly known regional fauna, a scientific committee was created to work out consistent solutions to current misidentifications and equivocal taxonomy status of some species. Expertise from this committee will be incorporated by NONATObase continually. The use of quantitative data was possible by standardization of a sample unit. All data, maps of distribution and references from a data set or a specified query can be visualized and exported to a commonly used data format in statistical analysis or reference manager software. The NONATO network has initialized with NONATObase, a valuable resource for marine ecologists and taxonomists. The database is expected to grow in functionality as it comes in useful, particularly regarding the challenges of dealing with molecular genetic data and tools to assess the effects of global environment change. Database URL: http://nonatobase.ufsc.br/ PMID:24573879
Web-based Electronic Sharing and RE-allocation of Assets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leverett, Dave; Miller, Robert A.; Berlin, Gary J.
2002-09-09
The Electronic Asses Sharing Program is a web-based application that provides the capability for complex-wide sharing and reallocation of assets that are excess, under utilized, or un-utilized. through a web-based fron-end and supporting has database with a search engine, users can search for assets that they need, search for assets needed by others, enter assets they need, and enter assets they have available for reallocation. In addition, entire listings of available assets and needed assets can be viewed. The application is written in Java, the hash database and search engine are in Object-oriented Java Database Management (OJDBM). The application willmore » be hosted on an SRS-managed server outside the Firewall and access will be controlled via a protected realm. An example of the application can be viewed at the followinig (temporary) URL: http://idgdev.srs.gov/servlet/srs.weshare.WeShare« less
Gene regulation knowledge commons: community action takes care of DNA binding transcription factors
Tripathi, Sushil; Vercruysse, Steven; Chawla, Konika; Christie, Karen R.; Blake, Judith A.; Huntley, Rachael P.; Orchard, Sandra; Hermjakob, Henning; Thommesen, Liv; Lægreid, Astrid; Kuiper, Martin
2016-01-01
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org PMID:27270715
RAIN: RNA–protein Association and Interaction Networks
Junge, Alexander; Refsgaard, Jan C.; Garde, Christian; Pan, Xiaoyong; Santos, Alberto; Alkan, Ferhat; Anthon, Christian; von Mering, Christian; Workman, Christopher T.; Jensen, Lars Juhl; Gorodkin, Jan
2017-01-01
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA–RNA and ncRNA–protein interactions and its integration with the STRING database of protein–protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded. Database URL: http://rth.dk/resources/rain PMID:28077569
UCbase 2.0: ultraconserved sequences database (2014 update)
Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian
2014-01-01
UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it PMID:24951797
The life and death of URLs in five biomedical informatics journals.
Carnevale, Randy J; Aronsky, Dominik
2007-04-01
To determine the decay rate of Uniform Record Locators (URLs) in the reference section of biomedical informatics journals. URL references were collected from printed journal articles of the first and middle issues of 1999-2004 and electronically available in-press articles in January 2005. We limited this set to five biomedical informatics journals: Artificial Intelligence in Medicine, International Journal of Medical Informatics, Journal of the American Medical Informatics Association: JAMIA, Methods of Information in Medicine, and Journal of Biomedical Informatics. During a 1-month period, URL access attempts were performed eight times a day at regular intervals. Of the 19,108 references extracted from 606 printed and 86 in-press articles, 1112 (5.8%) references contained a URL. Of the 1049 unique URLs, 726 (69.2%) were alive, 230 (21.9%) were dead, and 93 (8.9%) were comatose. URLs from in-press articles included 212 URLs, of which 169 (79.7%) were alive, 21 (9.9%) were dead, and 22 (10.4%) were comatose. The average annual decay, or link rot, rate was 5.4%. The URL decay rate in biomedical informatics journals is high. A commonly accepted strategy for the permanent archival of digital information referenced in scholarly publications is urgently needed.
Combining computational models, semantic annotations and simulation experiments in a graph database
Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar
2015-01-01
Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863
Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard
2014-01-01
Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific information about genes or microRNAs is quick and easily accessible. Hence, this platform can support the ongoing OS research and biomarker discovery. Database URL: http://osteosarcoma-db.uni-muenster.de. © The Author(s) 2014. Published by Oxford University Press.
Poos, Kathrin; Smida, Jan; Nathrath, Michaela; Maugg, Doris; Baumhoer, Daniel; Neumann, Anna; Korsching, Eberhard
2014-01-01
Osteosarcoma (OS) is the most common primary bone cancer exhibiting high genomic instability. This genomic instability affects multiple genes and microRNAs to a varying extent depending on patient and tumor subtype. Massive research is ongoing to identify genes including their gene products and microRNAs that correlate with disease progression and might be used as biomarkers for OS. However, the genomic complexity hampers the identification of reliable biomarkers. Up to now, clinico-pathological factors are the key determinants to guide prognosis and therapeutic treatments. Each day, new studies about OS are published and complicate the acquisition of information to support biomarker discovery and therapeutic improvements. Thus, it is necessary to provide a structured and annotated view on the current OS knowledge that is quick and easily accessible to researchers of the field. Therefore, we developed a publicly available database and Web interface that serves as resource for OS-associated genes and microRNAs. Genes and microRNAs were collected using an automated dictionary-based gene recognition procedure followed by manual review and annotation by experts of the field. In total, 911 genes and 81 microRNAs related to 1331 PubMed abstracts were collected (last update: 29 October 2013). Users can evaluate genes and microRNAs according to their potential prognostic and therapeutic impact, the experimental procedures, the sample types, the biological contexts and microRNA target gene interactions. Additionally, a pathway enrichment analysis of the collected genes highlights different aspects of OS progression. OS requires pathways commonly deregulated in cancer but also features OS-specific alterations like deregulated osteoclast differentiation. To our knowledge, this is the first effort of an OS database containing manual reviewed and annotated up-to-date OS knowledge. It might be a useful resource especially for the bone tumor research community, as specific information about genes or microRNAs is quick and easily accessible. Hence, this platform can support the ongoing OS research and biomarker discovery. Database URL: http://osteosarcoma-db.uni-muenster.de PMID:24865352
ATtRACT-a database of RNA-binding proteins and associated motifs.
Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique
2016-01-01
RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.
OntoMate: a text-mining tool aiding curation at the Rat Genome Database
Liu, Weisong; Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Nigam, Rajni; Smith, Jennifer R.; De Pons, Jeff; Dwinell, Melinda R.; Shimoyama, Mary
2015-01-01
The Rat Genome Database (RGD) is the premier repository of rat genomic, genetic and physiologic data. Converting data from free text in the scientific literature to a structured format is one of the main tasks of all model organism databases. RGD spends considerable effort manually curating gene, Quantitative Trait Locus (QTL) and strain information. The rapidly growing volume of biomedical literature and the active research in the biological natural language processing (bioNLP) community have given RGD the impetus to adopt text-mining tools to improve curation efficiency. Recently, RGD has initiated a project to use OntoMate, an ontology-driven, concept-based literature search engine developed at RGD, as a replacement for the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) search engine in the gene curation workflow. OntoMate tags abstracts with gene names, gene mutations, organism name and most of the 16 ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. All listed terms are linked both to data entry boxes and a term browser in the curation tool. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search. Using the system for literature search and import has streamlined the process compared to using PubMed. The system was built with a scalable and open architecture, including features specifically designed to accelerate the RGD gene curation process. With the use of bioNLP tools, RGD has added more automation to its curation workflow. Database URL: http://rgd.mcw.edu PMID:25619558
NPInter v3.0: an upgraded database of noncoding RNA-associated interactions
Hao, Yajing; Wu, Wei; Li, Hui; Yuan, Jiao; Luo, Jianjun; Zhao, Yi; Chen, Runsheng
2016-01-01
Despite the fact that a large quantity of noncoding RNAs (ncRNAs) have been identified, their functions remain unclear. To enable researchers to have a better understanding of ncRNAs’ functions, we updated the NPInter database to version 3.0, which contains experimentally verified interactions between ncRNAs (excluding tRNAs and rRNAs), especially long noncoding RNAs (lncRNAs) and other biomolecules (proteins, mRNAs, miRNAs and genomic DNAs). In NPInter v3.0, interactions pertaining to ncRNAs are not only manually curated from scientific literature but also curated from high-throughput technologies. In addition, we also curated lncRNA–miRNA interactions from in silico predictions supported by AGO CLIP-seq data. When compared with NPInter v2.0, the interactions are more informative (with additional information on tissues or cell lines, binding sites, conservation, co-expression values and other features) and more organized (with divisions on data sets by data sources, tissues or cell lines, experiments and other criteria). NPInter v3.0 expands the data set to 491,416 interactions in 188 tissues (or cell lines) from 68 kinds of experimental technologies. NPInter v3.0 also improves the user interface and adds new web services, including a local UCSC Genome Browser to visualize binding sites. Additionally, NPInter v3.0 defined a high-confidence set of interactions and predicted the functions of lncRNAs in human and mouse based on the interactions curated in the database. NPInter v3.0 is available at http://www.bioinfo.org/NPInter/. Database URL: http://www.bioinfo.org/NPInter/ PMID:27087310
AIM: a comprehensive Arabidopsis interactome module database and related interologs in plants.
Wang, Yi; Thilmony, Roger; Zhao, Yunjun; Chen, Guoping; Gu, Yong Q
2014-01-01
Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules that were generated using multiple analysis methods and integration of microarray expression data. All the modules in AIM are well annotated using multiple gene function knowledge databases. AIM provides a user-friendly interface for different types of searches and offers a powerful graphical viewer for displaying module networks linked to the enrichment annotation terms. Both interactive Venn diagram and power graph viewer are integrated into the database for easy comparison of modules. In addition, predicted interologs from other plant species (homologous proteins from different species that share a conserved interaction module) are available for each Arabidopsis module. AIM is a powerful systems biology platform for obtaining valuable insights into the function of proteins in Arabidopsis and other plants using the modules of the Arabidopsis interactome. Database URL:http://probes.pw.usda.gov/AIM Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
The Listeria monocytogenes strain 10403S BioCyc database
Orsi, Renato H.; Bergholz, Teresa M.; Wiedmann, Martin; Boor, Kathryn J.
2015-01-01
Listeria monocytogenes is a food-borne pathogen of humans and other animals. The striking ability to survive several stresses usually used for food preservation makes L. monocytogenes one of the biggest concerns to the food industry, while the high mortality of listeriosis in specific groups of humans makes it a great concern for public health. Previous studies have shown that a regulatory network involving alternative sigma (σ) factors and transcription factors is pivotal to stress survival. However, few studies have evaluated at the metabolic networks controlled by these regulatory mechanisms. The L. monocytogenes BioCyc database uses the strain 10403S as a model. Computer-generated initial annotation for all genes also allowed for identification, annotation and display of predicted reactions and pathways carried out by a single cell. Further ongoing manual curation based on published data as well as database mining for selected genes allowed the more refined annotation of functions, which, in turn, allowed for annotation of new pathways and fine-tuning of previously defined pathways to more L. monocytogenes-specific pathways. Using RNA-Seq data, several transcription start sites and promoter regions were mapped to the 10403S genome and annotated within the database. Additionally, the identification of promoter regions and a comprehensive review of available literature allowed the annotation of several regulatory interactions involving σ factors and transcription factors. The L. monocytogenes 10403S BioCyc database is a new resource for researchers studying Listeria and related organisms. It allows users to (i) have a comprehensive view of all reactions and pathways predicted to take place within the cell in the cellular overview, as well as to (ii) upload their own data, such as differential expression data, to visualize the data in the scope of predicted pathways and regulatory networks and to carry on enrichment analyses using several different annotations available within the database. Database URL: http://biocyc.org/organism-summary?object=10403S_RAST PMID:25819074
Addition of a breeding database in the Genome Database for Rosaceae
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox PMID:24247530
Addition of a breeding database in the Genome Database for Rosaceae.
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox.
GANSEKI: JAMSTEC Deep Seafloor Rock Sample Database Emerging to the New Phase
NASA Astrophysics Data System (ADS)
Tomiyama, T.; Ichiyama, Y.; Horikawa, H.; Sato, Y.; Soma, S.; Hanafusa, Y.
2013-12-01
Japan Agency for Marine-Earth Science and Technology (JAMSTEC) collects a lot of substantial samples as well as various geophysical data using its research vessels and submersibles. These samples and data, which are obtained by spending large amounts of human and physical resources, are precious wealth of the world scientific community. For the better use of these samples and data, it is important that they are utilized not only for initial purpose of each cruse but also for other general scientific and educational purposes of second-hand users. Based on the JAMSTEC data and sample handling policies [1], JAMSTEC has systematically stored samples and data obtained during research cruises, and provided them to domestic/foreign activities on research, education, and public relation. Being highly valued for second-hand usability, deep seafloor rock samples are one of the most important types of samples obtained by JAMSTEC, as oceanic biological samples and sediment core samples are. Rock samples can be utilized for natural history sciences and other various purposes; some of these purposes are connected to socially important issues such as earthquake mechanisms and mineral resource developments. Researchers and educators can access to JAMSTEC rock samples and associated data through 'GANSEKI [2]', the JAMSTEC Deep Seafloor Rock Sample Database. GANSEKI was established on the Internet in 2006 and its contents and functions have been continuously enriched and upgraded since then. GANSEKI currently provides 19 thousands of sample metadata, 9 thousands of collection inventory data and 18 thousands of geochemical data. Most of these samples are recovered from the North-western Pacific Ocean, although samples from other area are also included. The major update of GANSEKI held in May 2013 involved a replacement of database core system and a redesign of user interface. In the new GANSEKI, users can select samples easily and precisely using multi-index search, numerical constraints on geochemical data and thumbnail browsing of sample and thin-section photos. 'MyList' function allows users to organize, compare and download the data of selected samples. To develop a close network among online databases, the new GANSEKI allows multiple URL entries for individual samples. Now the curatorial staffs are working for maintaining references to other JAMSTEC databases such as 'DARWIN [3]' and 'J-EDI [4]'.
FARME DB: a functional antibiotic resistance element database
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
2017-01-01
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Piriyapongsa, Jittima; Bootchai, Chaiwat; Ngamphiw, Chumpol; Tongsima, Sissades
2014-01-01
microRNA (miRNA)–promoter interaction resource (microPIR) is a public database containing over 15 million predicted miRNA target sites located within human promoter sequences. These predicted targets are presented along with their related genomic and experimental data, making the microPIR database the most comprehensive repository of miRNA promoter target sites. Here, we describe major updates of the microPIR database including new target predictions in the mouse genome and revised human target predictions. The updated database (microPIR2) now provides ∼80 million human and 40 million mouse predicted target sites. In addition to being a reference database, microPIR2 is a tool for comparative analysis of target sites on the promoters of human–mouse orthologous genes. In particular, this new feature was designed to identify potential miRNA–promoter interactions conserved between species that could be stronger candidates for further experimental validation. We also incorporated additional supporting information to microPIR2 such as nuclear and cytoplasmic localization of miRNAs and miRNA–disease association. Extra search features were also implemented to enable various investigations of targets of interest. Database URL: http://www4a.biotec.or.th/micropir2 PMID:25425035
Davis, Allan Peter; Johnson, Robin J.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Rosenstein, Michael C.; Wiegers, Thomas C.; Mattingly, Carolyn J.
2012-01-01
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and manually curate a triad of chemical–gene, chemical–disease and gene–disease interactions. Typically, articles for CTD are selected using a chemical-centric approach by querying PubMed to retrieve a corpus containing the chemical of interest. Although this technique ensures adequate coverage of knowledge about the chemical (i.e. data completeness), it does not necessarily reflect the most current state of all toxicological research in the community at large (i.e. data currency). Keeping databases current with the most recent scientific results, as well as providing a rich historical background from legacy articles, is a challenging process. To address this issue of data currency, CTD designed and tested a journal-centric approach of curation to complement our chemical-centric method. We first identified priority journals based on defined criteria. Next, over 7 weeks, three biocurators reviewed 2425 articles from three consecutive years (2009–2011) of three targeted journals. From this corpus, 1252 articles contained relevant data for CTD and 52 752 interactions were manually curated. Here, we describe our journal selection process, two methods of document delivery for the biocurators and the analysis of the resulting curation metrics, including data currency, and both intra-journal and inter-journal comparisons of research topics. Based on our results, we expect that curation by select journals can (i) be easily incorporated into the curation pipeline to complement our chemical-centric approach; (ii) build content more evenly for chemicals, genes and diseases in CTD (rather than biasing data by chemicals-of-interest); (iii) reflect developing areas in environmental health and (iv) improve overall data currency for chemicals, genes and diseases. Database URL: http://ctdbase.org/ PMID:23221299
Reliable and Persistent Identification of Linked Data Elements
NASA Astrophysics Data System (ADS)
Wood, David
Linked Data techniques rely upon common terminology in a manner similar to a relational database'vs reliance on a schema. Linked Data terminology anchors metadata descriptions and facilitates navigation of information. Common vocabularies ease the human, social tasks of understanding datasets sufficiently to construct queries and help to relate otherwise disparate datasets. Vocabulary terms must, when using the Resource Description Framework, be grounded in URIs. A current bestpractice on the World Wide Web is to serve vocabulary terms as Uniform Resource Locators (URLs) and present both human-readable and machine-readable representations to the public. Linked Data terminology published to theWorldWideWeb may be used by others without reference or notification to the publishing party. That presents a problem: Vocabulary publishers take on an implicit responsibility to maintain and publish their terms via the URLs originally assigned, regardless of the inconvenience such a responsibility may cause. Over the course of years, people change jobs, publishing organizations change Internet domain names, computers change IP addresses,systems administrators publish old material in new ways. Clearly, a mechanism is required to manageWeb-based vocabularies over a long term. This chapter places Linked Data vocabularies in context with the wider concepts of metadata in general and specifically metadata on the Web. Persistent identifier mechanisms are reviewed, with a particular emphasis on Persistent URLs, or PURLs. PURLs and PURL services are discussed in the context of Linked Data. Finally, historic weaknesses of PURLs are resolved by the introduction of a federation of PURL services to address needs specific to Linked Data.
Building an OpenURL Resolver in Your Own Workshop
ERIC Educational Resources Information Center
Dahl, Mark
2004-01-01
OpenURL resolver is the next big thing for libraries. An OpenURL resolver is simply a piece of software that sucks in attached data and serves up a Web page that tells one where he or she can get the book or article represented by it. In this article, the author describes how he designed an OpenURL resolver for his library, the Lewis & Clark…
NASA Astrophysics Data System (ADS)
Meertens, C.; Wier, S.; Ahern, T.; Casey, R.; Weertman, B.; Laughbon, C.
2008-12-01
UNAVCO and the IRIS DMC are data service partners for seismic visualization, particularly for hypocentral data and tomography. UNAVCO provides the GEON Integrated Data Viewer (IDV), an extension of the Unidata IDV, a free, interactive, research-level, software display and analysis tool for data in 3D (latitude, longitude, depth) and 4D (with time), located on or inside the Earth. The GEON IDV is designed to meet the challenge of investigating complex, multi-variate, time-varying, three- dimensional geoscience data in the context of new remote and shared data sources. The GEON IDV supports data access from data sources using HTTP and FTP servers, OPeNDAP servers, THREDDS catalogs, RSS feeds, and WMS (web map) servers. The IRIS DMC (Data Management System) has developed web services providing data for earthquake hypocentral data and seismic tomography model grids. These services can be called by the GEON IDV to access data at IRIS without copying files. The IRIS Earthquake Browser (IEB) is a web-based query tool for hypocentral data. The IEB combines the DMC's large database of more than 1,900,000 earthquakes with the Google Maps web interface. With the IEB you can quickly find earthquakes in any region of the globe and then import this information into the GEON Integrated Data Viewer where the hypocenters may be visualized. You can select earthquakes by location region, time, depth, and magnitude. The IEB gives the IDV a URL to the selected data. The IDV then shows the data as maps or 3D displays, with interactive control of vertical scale, area, map projection, with symbol size and color control by magnitude or depth. The IDV can show progressive time animation of, for example, aftershocks filling a source region. The IRIS Tomoserver converts seismic tomography model output grids to NetCDF for use in the IDV. The Tomoserver accepts a tomographic model file as input from a user and provides an equivalent NetCDF file as output. The service supports NA04, S3D, A1D and CUB input file formats, contributed by their respective creators. The NetCDF file is saved to a location that can be referenced with a URL on an IRIS server. The URL for the NetCDF file is provided to the user. The user can download the data from IRIS, or copy the URL into IDV directly for interpretation, and the IDV will access the data at IRIS. The Tomoserver conversion software was developed by Instrumental Software Technologies, Inc. Use cases with the GEON IDV and IRIS DMC data services will be shown.
Tomar, Navneet; Mishra, Akhilesh; Mrinal, Nirotpal; Jayaram, B.
2016-01-01
Transcription factors (TFs) bind at multiple sites in the genome and regulate expression of many genes. Regulating TF binding in a gene specific manner remains a formidable challenge in drug discovery because the same binding motif may be present at multiple locations in the genome. Here, we present Onco-Regulon (http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm), an integrated database of regulatory motifs of cancer genes clubbed with Unique Sequence-Predictor (USP) a software suite that identifies unique sequences for each of these regulatory DNA motifs at the specified position in the genome. USP works by extending a given DNA motif, in 5′→3′, 3′ →5′ or both directions by adding one nucleotide at each step, and calculates the frequency of each extended motif in the genome by Frequency Counter programme. This step is iterated till the frequency of the extended motif becomes unity in the genome. Thus, for each given motif, we get three possible unique sequences. Closest Sequence Finder program predicts off-target drug binding in the genome. Inclusion of DNA-Protein structural information further makes Onco-Regulon a highly informative repository for gene specific drug development. We believe that Onco-Regulon will help researchers to design drugs which will bind to an exclusive site in the genome with no off-target effects, theoretically. Database URL: http://www.scfbio-iitd.res.in/software/onco/NavSite/index.htm PMID:27515825
Recent Research in Science Teaching and Learning
ERIC Educational Resources Information Center
Allen, Deborah
2012-01-01
This article features recent research in science teaching and learning. It presents three current articles of interest in life sciences education, as well as more general and noteworthy publications in education research. URLs are provided for the abstracts or full text of articles. For articles listed as "Abstract available," full text may be…
Phytochemica: a platform to explore phytochemicals of medicinal plants
Pathania, Shivalika; Ramakrishnan, Sai Mukund; Bagler, Ganesh
2015-01-01
Plant-derived molecules (PDMs) are known to be a rich source of diverse scaffolds that could serve as the basis for rational drug design. Structured compilation of phytochemicals from traditional medicinal plants can facilitate prospection for novel PDMs and their analogs as therapeutic agents. Atropa belladonna, Catharanthus roseus, Heliotropium indicum, Picrorhiza kurroa and Podophyllum hexandrum are important Himalayan medicinal plants, reported to have immense therapeutic properties against various diseases. We present Phytochemica, a structured compilation of 963 PDMs from these plants, inclusive of their plant part source, chemical classification, IUPAC names, SMILES notations, physicochemical properties and 3-dimensional structures with associated references. Phytochemica is an exhaustive resource of natural molecules facilitating prospection for therapeutic molecules from medicinally important plants. It also offers refined search option to explore the neighbourhood of chemical space against ZINC database to identify analogs of natural molecules at user-defined cut-off. Availability of phytochemical structured dataset may enable their direct use in in silico drug discovery which will hasten the process of lead identification from natural products under proposed hypothesis, and may overcome urgent need for phytomedicines. Compilation and accessibility of indigenous phytochemicals and their derivatives can be a source of considerable advantage to research institutes as well as industries. Database URL: home.iitj.ac.in/∼bagler/webservers/Phytochemica PMID:26255307
Rangel, Luiz Thibério; Novaes, Jeniffer; Durham, Alan M.; Madeira, Alda Maria B. N.; Gruber, Arthur
2013-01-01
Parasites of the genus Eimeria infect a wide range of vertebrate hosts, including chickens. We have recently reported a comparative analysis of the transcriptomes of Eimeria acervulina, Eimeria maxima and Eimeria tenella, integrating ORESTES data produced by our group and publicly available Expressed Sequence Tags (ESTs). All cDNA reads have been assembled, and the reconstructed transcripts have been submitted to a comprehensive functional annotation pipeline. Additional studies included orthology assignment across apicomplexan parasites and clustering analyses of gene expression profiles among different developmental stages of the parasites. To make all this body of information publicly available, we constructed the Eimeria Transcript Database (EimeriaTDB), a web repository that provides access to sequence data, annotation and comparative analyses. Here, we describe the web interface, available sequence data sets and query tools implemented on the site. The main goal of this work is to offer a public repository of sequence and functional annotation data of reconstructed transcripts of parasites of the genus Eimeria. We believe that EimeriaTDB will represent a valuable and complementary resource for the Eimeria scientific community and for those researchers interested in comparative genomics of apicomplexan parasites. Database URL: http://www.coccidia.icb.usp.br/eimeriatdb/ PMID:23411718
... MedlinePlus Connect in Use URL of this page: https://medlineplus.gov/connect/users.html MedlinePlus Connect in ... will change.) Old URLs New URLs Web Application https://apps.nlm.nih.gov/medlineplus/services/mpconnect.cfm? ...
MGDB: a comprehensive database of genes involved in melanoma.
Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng
2015-01-01
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
JBioWH: an open-source Java framework for bioinformatics data integration
Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor
2013-01-01
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh PMID:23846595
JBioWH: an open-source Java framework for bioinformatics data integration.
Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor
2013-01-01
The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.
Xu, Huilei; Baroukh, Caroline; Dannenfelser, Ruth; Chen, Edward Y; Tan, Christopher M; Kou, Yan; Kim, Yujin E; Lemischka, Ihor R; Ma'ayan, Avi
2013-01-01
High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE
MedlinePlus Connect: Email List
... MedlinePlus Connect → Email List URL of this page: https://medlineplus.gov/connect/emaillist.html MedlinePlus Connect: Email ... will change.) Old URLs New URLs Web Application https://apps.nlm.nih.gov/medlineplus/services/mpconnect.cfm? ...
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
eMelanoBase: an online locus-specific variant database for familial melanoma.
Fung, David C Y; Holland, Elizabeth A; Becker, Therese M; Hayward, Nicholas K; Bressac-de Paillerets, Brigitte; Mann, Graham J
2003-01-01
A proportion of melanoma-prone individuals in both familial and non-familial contexts has been shown to carry inactivating mutations in either CDKN2A or, rarely, CDK4. CDKN2A is a complex locus that encodes two unrelated proteins from alternately spliced transcripts that are read in different frames. The alpha transcript (exons 1alpha, 2, and 3) produces the p16INK4A cyclin-dependent kinase inhibitor, while the beta transcript (exons 1beta and 2) is translated as p14ARF, a stabilizing factor of p53 levels through binding to MDM2. Mutations in exon 2 can impair both polypeptides and insertions and deletions in exons 1alpha, 1beta, and 2, which can theoretically generate p16INK4A-p14ARF fusion proteins. No online database currently takes into account all the consequences of these genotypes, a situation compounded by some problematic previous annotations of CDKN2A-related sequences and descriptions of their mutations. As an initiative of the international Melanoma Genetics Consortium, we have therefore established a database of germline variants observed in all loci implicated in familial melanoma susceptibility. Such a comprehensive, publicly accessible database is an essential foundation for research on melanoma susceptibility and its clinical application. Our database serves two types of data as defined by HUGO. The core dataset includes the nucleotide variants on the genomic and transcript levels, amino acid variants, and citation. The ancillary dataset includes keyword description of events at the transcription and translation levels and epidemiological data. The application that handles users' queries was designed in the model-view-controller architecture and was implemented in Java. The object-relational database schema was deduced using functional dependency analysis. We hereby present our first functional prototype of eMelanoBase. The service is accessible via the URL www.wmi.usyd.edu.au:8080/melanoma.html. Copyright 2002 Wiley-Liss, Inc.
Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl
2016-01-01
The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.
DBATE: database of alternative transcripts expression.
Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2013-01-01
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
UCbase 2.0: ultraconserved sequences database (2014 update).
Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian
2014-01-01
UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it. © The Author(s) 2014. Published by Oxford University Press.
Egorova, K.S.; Kondakova, A.N.; Toukach, Ph.V.
2015-01-01
Carbohydrates are biological blocks participating in diverse and crucial processes both at cellular and organism levels. They protect individual cells, establish intracellular interactions, take part in the immune reaction and participate in many other processes. Glycosylation is considered as one of the most important modifications of proteins and other biologically active molecules. Still, the data on the enzymatic machinery involved in the carbohydrate synthesis and processing are scattered, and the advance on its study is hindered by the vast bulk of accumulated genetic information not supported by any experimental evidences for functions of proteins that are encoded by these genes. In this article, we present novel instruments for statistical analysis of glycomes in taxa. These tools may be helpful for investigating carbohydrate-related enzymatic activities in various groups of organisms and for comparison of their carbohydrate content. The instruments are developed on the Carbohydrate Structure Database (CSDB) platform and are available freely on the CSDB web-site at http://csdb.glycoscience.ru. Database URL: http://csdb.glycoscience.ru PMID:26337239
Traditional Chinese Medical Journals currently published in mainland China.
Fan, Wei-Yu; Tong, Yuan-Yuan; Pan, Yan-Li; Shang, Wen-Ling; Shen, Jia-Yi; Li, Wei; Li, Li-Jun
2008-06-01
Traditional Chinese Medical (TCM) journals have been playing an important role in scholarly communication in China. However, the information in those periodicals was not enough for international readers. This study aims to provide an overview of TCM journals in China. TCM journals currently published in mainland China were identified from Chinese databases and journal subscription catalogs. Data on publication start year, publishing region, language, whether core journals, whether indexed in famous international databases, with/without accessible URL were investigated, and subjects of journals were categorized. One hundred and forty-nine (149) TCM journals are currently published in mainland China; 88.59% of them are academic journals. The subjects of those journals are various, ranging from the general TCM, integrative medicine, herbal medicines, to veterinary TCM. The publishing areas are distributed in 27 regions, with Beijing having the most TCM journals published. One hundred and forty-two (142) of those periodicals are in Chinese, while 4 are also in English, and 3 in other languages. Only 8 TCM journals were recognized as core journals, and 5 were identified as both core journals and journals with high impacted articles by all evaluation systems in China. A few of the TCM journals from mainland China are indexed in PubMed/MEDLINE (10), EMBASE (5), Biological Abstracts (2), or AMED (1). Online full-text Chinese databases CJFD, COJ, and CSTPD cover most of TCM the journals published in the country. One hundred (100) TCM journals have accessible URLs, but only 3 are open access with free full texts. Publication of TCM journals in China has been active in academic communication in the past 20 years. However, only a few of them received recognized high evaluation. English information from them is not sufficient. Open access is not extensively acceptable. The accessibility of those journals to international readers needs to be improved.
CicerTransDB 1.0: a resource for expression and functional study of chickpea transcription factors.
Gayali, Saurabh; Acharya, Shankar; Lande, Nilesh Vikram; Pandey, Aarti; Chakraborty, Subhra; Chakraborty, Niranjan
2016-07-29
Transcription factor (TF) databases are major resource for systematic studies of TFs in specific species as well as related family members. Even though there are several publicly available multi-species databases, the information on the amount and diversity of TFs within individual species is fragmented, especially for newly sequenced genomes of non-model species of agricultural significance. We constructed CicerTransDB (Cicer Transcription Factor Database), the first database of its kind, which would provide a centralized putatively complete list of TFs in a food legume, chickpea. CicerTransDB, available at www.cicertransdb.esy.es , is based on chickpea (Cicer arietinum L.) annotation v 1.0. The database is an outcome of genome-wide domain study and manual classification of TF families. This database not only provides information of the gene, but also gene ontology, domain and motif architecture. CicerTransDB v 1.0 comprises information of 1124 genes of chickpea and enables the user to not only search, browse and download sequences but also retrieve sequence features. CicerTransDB also provides several single click interfaces, transconnecting to various other databases to ease further analysis. Several webAPI(s) integrated in the database allow end-users direct access of data. A critical comparison of CicerTransDB with PlantTFDB (Plant Transcription Factor Database) revealed 68 novel TFs in the chickpea genome, hitherto unexplored. Database URL: http://www.cicertransdb.esy.es.
Recent Research in Science Teaching and Learning
ERIC Educational Resources Information Center
Allen, Deborah
2013-01-01
This feature is designed to point "CBE-Life Sciences Education" readers to current articles of interest in life sciences education, as well as more general and noteworthy publications in education research. URLs are provided for the abstracts or full text of articles. This themed issue focuses on recent studies of concepts and…
Recent Research in Science Teaching and Learning
ERIC Educational Resources Information Center
Allen, Deborah
2014-01-01
This feature is designed to point "CBE - Life Sciences Education" readers to current articles of interest in life sciences education as well as more general and noteworthy publications in education research. URLs are provided for the abstracts or full text of articles. For articles listed as "Abstract available," full text may…
Recent Research in Science Teaching and Learning
ERIC Educational Resources Information Center
Allen, Deborah
2013-01-01
This article is designed to point "CBE-Life Sciences Education" readers to current articles of interest in life sciences education as well as more general and noteworthy publications in education research. URLs are provided for the abstracts or full text of articles. For articles listed as "Abstract available," full text may be…
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
Swetha, Rayapadi G; Kala Sekar, Dinesh Kumar; Ramaiah, Sudha; Anbarasu, Anand; Sekar, Kanagaraj
2014-12-01
Haemophilus influenzae (H. Influenzae) is the causative agent of pneumonia, bacteraemia and meningitis. The organism is responsible for large number of deaths in both developed and developing countries. Even-though the first bacterial genome to be sequenced was that of H. Influenzae, there is no exclusive database dedicated for H. Influenzae. This prompted us to develop the Haemophilus influenzae Genome Database (HIGDB). All data of HIGDB are stored and managed in MySQL database. The HIGDB is hosted on Solaris server and developed using PERL modules. Ajax and JavaScript are used for the interface development. The HIGDB contains detailed information on 42,741 proteins, 18,077 genes including 10 whole genome sequences and also 284 three dimensional structures of proteins of H. influenzae. In addition, the database provides "Motif search" and "GBrowse". The HIGDB is freely accessible through the URL: http://bioserver1.physics.iisc.ernet.in/HIGDB/. The HIGDB will be a single point access for bacteriological, clinical, genomic and proteomic information of H. influenzae. The database can also be used to identify DNA motifs within H. influenzae genomes and to compare gene or protein sequences of a particular strain with other strains of H. influenzae. Copyright © 2014 Elsevier Ltd. All rights reserved.
2012-01-01
Background Drugs safety issues are now recognized as being factors generating the most reasons for drug withdrawals at various levels of development and at the post-approval stage. Among them cardiotoxicity remains the main reason, despite the substantial effort put into in vitro and in vivo testing, with the main focus put on hERG channel inhibition as the hypothesized surrogate of drug proarrhythmic potency. The large interest in the IKr current has resulted in the development of predictive tools and informative databases describing a drug's susceptibility to interactions with the hERG channel, although there are no similar, publicly available sets of data describing other ionic currents driven by the human cardiomyocyte ionic channels, which are recognized as an overlooked drug safety target. Discussion The aim of this database development and publication was to provide a scientifically useful, easily usable and clearly verifiable set of information describing not only IKr (hERG), but also other human cardiomyocyte specific ionic channels inhibition data (IKs, INa, ICa). Summary The broad range of data (chemical space and in vitro settings) and the easy to use user interface makes tox-database.net a useful tool for interested scientists. Database URL http://tox-database.net. PMID:22947121
SNPversity: a web-based tool for visualizing diversity
Schott, David A; Vinnakota, Abhinav G; Portwood, John L; Andorf, Carson M
2018-01-01
Abstract Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces. Database URL: https://www.maizegdb.org/snpversity PMID:29688387
Researcher Teacher Program: Achievements and Shortcomings
ERIC Educational Resources Information Center
Nami, Shamsi; Matin, Nematallah
2017-01-01
Matin1 1 Faculty member of Organization for Educational Research and Planning (OERP), Iran Correspondence: Shamsi Nami, Faculty member of Organization for Educational Research and Planning (OERP), Iran. E-mail: shamsinami@gmail.com Received: July 24, 2016 Accepted: October 10, 2016 Online Published: February 27, 2017 doi:10.5539/ies.v10n3p99 URL:…
MalaCards: an integrated compendium for diseases and their annotation
Rappaport, Noa; Nativ, Noam; Stelzer, Gil; Twik, Michal; Guan-Golan, Yaron; Iny Stein, Tsippi; Bahir, Iris; Belinky, Frida; Morrey, C. Paul; Safran, Marilyn; Lancet, Doron
2013-01-01
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/ PMID:23584832
A RESTful application programming interface for the PubMLST molecular typing and genome databases
Bray, James E.; Maiden, Martin C. J.
2017-01-01
Abstract Molecular typing is used to differentiate microorganisms at the subspecies or strain level for epidemiological investigations, infection control, public health and environmental sampling. DNA sequence-based typing methods require authoritative databases that link sequence variants to nomenclature in order to facilitate communication and comparison of identified types in national or global settings. The PubMLST website (https://pubmlst.org/) fulfils this role for over a hundred microorganisms for which it hosts curated molecular sequence typing data, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene typing approaches. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) and whole genome MLST (wgMLST) which catalogue the allelic diversity found in hundreds to thousands of genes. These approaches provide a common nomenclature for high-resolution strain characterization and comparison. Molecular typing information is linked to isolate provenance, phenotype, and increasingly genome assemblies, providing a resource for outbreak investigation and research in to population structure, gene association, global epidemiology and vaccine coverage. A Representational State Transfer (REST) Application Programming Interface (API) has been developed for the PubMLST website to make these large quantities of structured molecular typing and whole genome sequence data available for programmatic access by any third party application. The API is an integral component of the Bacterial Isolate Genome Sequence Database (BIGSdb) platform that is used to host PubMLST resources, and exposes all public data within the site. In addition to data browsing, searching and download, the API supports authentication and submission of new data to curator queues. Database URL: http://rest.pubmlst.org/ PMID:29220452
Publications of the Western Geologic Mapping Team 1997-1998
Stone, Paul; Powell, C.L.
1999-01-01
The Western Geologic Mapping Team (WGMT) of the U.S. Geological Survey, Geologic Division (USGS, GD), conducts geologic mapping and related topical earth-science studies in the western United States. This work is focused on areas where modern geologic maps and associated earth-science data are needed to address key societal and environmental issues such as ground-water quality, potential geologic hazards, and land-use decisions. Areas of primary emphasis currently include southern California, the San Francisco Bay region, the Pacific Northwest, the Las Vegas urban corridor, and selected National Park lands. The team has its headquarters in Menlo Park, California, and maintains smaller field offices at several other locations in the western United States. The results of research conducted by the WGMT are released to the public as a variety of databases, maps, text reports, and abstracts, both through the internal publication system of the USGS and in diverse external publications such as scientific journals and books. This report lists publications of the WGMT released in calendar years 1997 and 1998. Most of the publications listed were authored or coauthored by WGMT staff. However, the list also includes some publications authored by formal non-USGS cooperators with the WGMT, as well as some authored by USGS staff outside the WGMT in cooperation with WGMT projects. Several of the publications listed are available on the World Wide Web; for these, URL addresses are provided. Most of these Web publications are USGS open-file reports that contain large digital databases of geologic map and related information. For these, the bibliographic citation refers specifically to an explanatory pamphlet containing information about the content and accessibility of the database, not to the actual map or related information comprising the database itself.
Evaluation of Used Fuel Disposition in Clay-Bearing Rock
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jové Colón, Carlos F.; Weck, Philippe F.; Sassani, David H.
2014-08-01
Radioactive waste disposal in shale/argillite rock formations has been widely considered given its desirable isolation properties (low permeability), geochemically reduced conditions, anomalous groundwater pressures, and widespread geologic occurrence. Clay/shale rock formations are characterized by their high content of clay minerals such as smectites and illites where diffusive transport and chemisorption phenomena predominate. These, in addition to low permeability, are key attributes of shale to impede radionuclide mobility. Shale host-media has been comprehensively studied in international nuclear waste repository programs as part of underground research laboratories (URLs) programs in Switzerland, France, Belgium, and Japan. These investigations, in some cases a decademore » or more long, have produced a large but fundamental body of information spanning from site characterization data (geological, hydrogeological, geochemical, geomechanical) to controlled experiments on the engineered barrier system (EBS) (barrier clay and seals materials). Evaluation of nuclear waste disposal in shale formations in the USA was conducted in the late 70’s and mid 80’s. Most of these studies evaluated the potential for shale to host a nuclear waste repository but not at the programmatic level of URLs in international repository programs. This report covers various R&D work and capabilities relevant to disposal of heat-generating nuclear waste in shale/argillite media. Integration and cross-fertilization of these capabilities will be utilized in the development and implementation of the shale/argillite reference case planned for FY15. Disposal R&D activities under the UFDC in the past few years have produced state-of-the-art modeling capabilities for coupled Thermal-Hydrological-Mechanical-Chemical (THMC), used fuel degradation (source term), and thermodynamic modeling and database development to evaluate generic disposal concepts. The THMC models have been developed for shale repository leveraging in large part on the information garnered in URLs and laboratory data to test and demonstrate model prediction capability and to accurately represent behavior of the EBS and the natural (barrier) system (NS). In addition, experimental work to improve our understanding of clay barrier interactions and TM couplings at high temperatures are key to evaluate thermal effects as a result of relatively high heat loads from waste and the extent of sacrificial zones in the EBS. To assess the latter, experiments and modeling approaches have provided important information on the stability and fate of barrier materials under high heat loads. This information is central to the assessment of thermal limits and the implementation of the reference case when constraining EBS properties and the repository layout (e.g., waste package and drift spacing). This report is comprised of various parts, each one describing various R&D activities applicable to shale/argillite media. For example, progress made on modeling and experimental approaches to analyze physical and chemical interactions affecting clay in the EBS, NS, and used nuclear fuel (source term) in support of R&D objectives. It also describes the development of a reference case for shale/argillite media. The accomplishments of these activities are summarized as follows: Development of a reference case for shale/argillite; Investigation of Reactive Transport and Coupled THM Processes in EBS: FY14; Update on Experimental Activities on Buffer/Backfill Interactions at elevated Pressure and Temperature; and Thermodynamic Database Development: Evaluation Strategy, Modeling Tools, First-Principles Modeling of Clay, and Sorption Database Assessment;ANL Mixed Potential Model For Used Fuel Degradation: Application to Argillite and Crystalline Rock Environments.« less
TriatoKey: a web and mobile tool for biodiversity identification of Brazilian triatomine species
Márcia de Oliveira, Luciana; Nogueira de Brito, Raissa; Anderson Souza Guimarães, Paul; Vitor Mastrângelo Amaro dos Santos, Rômulo; Gonçalves Diotaiuti, Liléia; de Cássia Moreira de Souza, Rita
2017-01-01
Abstract Triatomines are blood-sucking insects that transmit the causative agent of Chagas disease, Trypanosoma cruzi. Despite being recognized as a difficult task, the correct taxonomic identification of triatomine species is crucial for vector control in Latin America, where the disease is endemic. In this context, we have developed a web and mobile tool based on PostgreSQL database to help healthcare technicians to overcome the difficulties to identify triatomine vectors when the technical expertise is missing. The web and mobile version makes use of real triatomine species pictures and dichotomous key method to support the identification of potential vectors that occur in Brazil. It provides a user example-driven interface with simple language. TriatoKey can also be useful for educational purposes. Database URL: http://triatokey.cpqrr.fiocruz.br PMID:28605769
CerebralWeb: a Cytoscape.js plug-in to visualize networks stratified by subcellular localization.
Frias, Silvia; Bryan, Kenneth; Brinkman, Fiona S L; Lynn, David J
2015-01-01
CerebralWeb is a light-weight JavaScript plug-in that extends Cytoscape.js to enable fast and interactive visualization of molecular interaction networks stratified based on subcellular localization or other user-supplied annotation. The application is designed to be easily integrated into any website and is configurable to support customized network visualization. CerebralWeb also supports the automatic retrieval of Cerebral-compatible localizations for human, mouse and bovine genes via a web service and enables the automated parsing of Cytoscape compatible XGMML network files. CerebralWeb currently supports embedded network visualization on the InnateDB (www.innatedb.com) and Allergy and Asthma Portal (allergen.innatedb.com) database and analysis resources. Database tool URL: http://www.innatedb.com/CerebralWeb © The Author(s) 2015. Published by Oxford University Press.
Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal
2013-01-01
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456
Huntley, Melanie A; Larson, Jessica L; Chaivorapol, Christina; Becker, Gabriel; Lawrence, Michael; Hackney, Jason A; Kaminker, Joshua S
2013-12-15
It is common for computational analyses to generate large amounts of complex data that are difficult to process and share with collaborators. Standard methods are needed to transform such data into a more useful and intuitive format. We present ReportingTools, a Bioconductor package, that automatically recognizes and transforms the output of many common Bioconductor packages into rich, interactive, HTML-based reports. Reports are not generic, but have been individually designed to reflect content specific to the result type detected. Tabular output included in reports is sortable, filterable and searchable and contains context-relevant hyperlinks to external databases. Additionally, in-line graphics have been developed for specific analysis types and are embedded by default within table rows, providing a useful visual summary of underlying raw data. ReportingTools is highly flexible and reports can be easily customized for specific applications using the well-defined API. The ReportingTools package is implemented in R and available from Bioconductor (version ≥ 2.11) at the URL: http://bioconductor.org/packages/release/bioc/html/ReportingTools.html. Installation instructions and usage documentation can also be found at the above URL.
Scholz-Starke, Björn; Burkhardt, Ulrich; Lesch, Stephan; Rick, Sebastian; Russell, David; Roß-Nickoll, Martina; Ottermanns, Richard
2017-01-01
Abstract The Edaphostat web application allows interactive and dynamic analyses of soil organism data stored in the Edaphobase data warehouse. It is part of the Edaphobase web application and can be accessed by any modern browser. The tool combines data from different sources (publications, field studies and museum collections) and allows species preferences along various environmental gradients (i.e. C/N ratio and pH) and classification systems (habitat type and soil type) to be analyzed. Database URL: Edaphostat is part of the Edaphobase Web Application available at https://portal.edaphobase.org PMID:29220469
Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge
Scerri, Antony; Kuriakose, John; Deshmane, Amit Ajit; Stanger, Mark; Moore, Rebekah; Naik, Raj; de Waard, Anita
2017-01-01
Abstract We developed a two-stream, Apache Solr-based information retrieval system in response to the bioCADDIE 2016 Dataset Retrieval Challenge. One stream was based on the principle of word embeddings, the other was rooted in ontology based indexing. Despite encountering several issues in the data, the evaluation procedure and the technologies used, the system performed quite well. We provide some pointers towards future work: in particular, we suggest that more work in query expansion could benefit future biomedical search engines. Database URL: https://data.mendeley.com/datasets/zd9dxpyybg/1 PMID:29220454
Davis, Allan Peter; Wiegers, Thomas C.; Roberts, Phoebe M.; King, Benjamin L.; Lay, Jean M.; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J.; Enayetallah, Ahmed E.; Mattingly, Carolyn J.
2013-01-01
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/ PMID:24288140
Davis, Allan Peter; Wiegers, Thomas C; Roberts, Phoebe M; King, Benjamin L; Lay, Jean M; Lennon-Hopkins, Kelley; Sciaky, Daniela; Johnson, Robin; Keating, Heather; Greene, Nigel; Hernandez, Robert; McConnell, Kevin J; Enayetallah, Ahmed E; Mattingly, Carolyn J
2013-01-01
Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88,629 articles relating over 1,200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 254,173 toxicogenomic interactions (152,173 chemical-disease, 58,572 chemical-gene, 5,345 gene-disease and 38,083 phenotype interactions). All chemical-gene-disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer's text-mining process to collate the articles, and CTD's curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug-disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades' worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/
toxoMine: an integrated omics data warehouse for Toxoplasma gondii systems biology research
Rhee, David B.; Croken, Matthew McKnight; Shieh, Kevin R.; Sullivan, Julie; Micklem, Gos; Kim, Kami; Golden, Aaron
2015-01-01
Toxoplasma gondii (T. gondii) is an obligate intracellular parasite that must monitor for changes in the host environment and respond accordingly; however, it is still not fully known which genetic or epigenetic factors are involved in regulating virulence traits of T. gondii. There are on-going efforts to elucidate the mechanisms regulating the stage transition process via the application of high-throughput epigenomics, genomics and proteomics techniques. Given the range of experimental conditions and the typical yield from such high-throughput techniques, a new challenge arises: how to effectively collect, organize and disseminate the generated data for subsequent data analysis. Here, we describe toxoMine, which provides a powerful interface to support sophisticated integrative exploration of high-throughput experimental data and metadata, providing researchers with a more tractable means toward understanding how genetic and/or epigenetic factors play a coordinated role in determining pathogenicity of T. gondii. As a data warehouse, toxoMine allows integration of high-throughput data sets with public T. gondii data. toxoMine is also able to execute complex queries involving multiple data sets with straightforward user interaction. Furthermore, toxoMine allows users to define their own parameters during the search process that gives users near-limitless search and query capabilities. The interoperability feature also allows users to query and examine data available in other InterMine systems, which would effectively augment the search scope beyond what is available to toxoMine. toxoMine complements the major community database ToxoDB by providing a data warehouse that enables more extensive integrative studies for T. gondii. Given all these factors, we believe it will become an indispensable resource to the greater infectious disease research community. Database URL: http://toxomine.org PMID:26130662
Text mining and expert curation to develop a database on psychiatric diseases and their genes
Gutiérrez-Sacristán, Alba; Bravo, Àlex; Portero-Tresserra, Marta; Valverde, Olga; Armario, Antonio; Blanco-Gandía, M.C.; Farré, Adriana; Fernández-Ibarrondo, Lierni; Fonseca, Francina; Giraldo, Jesús; Leis, Angela; Mané, Anna; Mayer, M.A.; Montagud-Romero, Sandra; Nadal, Roser; Ortiz, Jordi; Pavon, Francisco Javier; Perez, Ezequiel Jesús; Rodríguez-Arias, Marta; Serrano, Antonia; Torrens, Marta; Warnault, Vincent; Sanz, Ferran
2017-01-01
Abstract Psychiatric disorders constitute one of the main causes of disability worldwide. During the past years, considerable research has been conducted on the genetic architecture of such diseases, although little understanding of their etiology has been achieved. The difficulty to access up-to-date, relevant genotype-phenotype information has hampered the application of this wealth of knowledge to translational research and clinical practice in order to improve diagnosis and treatment of psychiatric patients. PsyGeNET (http://www.psygenet.org/) has been developed with the aim of supporting research on the genetic architecture of psychiatric diseases, by providing integrated and structured accessibility to their genotype–phenotype association data, together with analysis and visualization tools. In this article, we describe the protocol developed for the sustainable update of this knowledge resource. It includes the recruitment of a team of domain experts in order to perform the curation of the data extracted by text mining. Annotation guidelines and a web-based annotation tool were developed to support the curators’ tasks. A curation workflow was designed including a pilot phase and two rounds of curation and analysis phases. Negative evidence from the literature on gene–disease associations (GDAs) was taken into account in the curation process. We report the results of the application of this workflow to the curation of GDAs for PsyGeNET, including the analysis of the inter-annotator agreement and suggest this model as a suitable approach for the sustainable development and update of knowledge resources. Database URL: http://www.psygenet.org PsyGeNET corpus: http://www.psygenet.org/ds/PsyGeNET/results/psygenetCorpus.tar PMID:29220439
Kusber, W.-H.; Tschöpe, O.; Güntsch, A.; Berendsohn, W. G.
2017-01-01
Abstract Biological research collections holding billions of specimens world-wide provide the most important baseline information for systematic biodiversity research. Increasingly, specimen data records become available in virtual herbaria and data portals. The traditional (physical) annotation procedure fails here, so that an important pathway of research documentation and data quality control is broken. In order to create an online annotation system, we analysed, modeled and adapted traditional specimen annotation workflows. The AnnoSys system accesses collection data from either conventional web resources or the Biological Collection Access Service (BioCASe) and accepts XML-based data standards like ABCD or DarwinCore. It comprises a searchable annotation data repository, a user interface, and a subscription based message system. We describe the main components of AnnoSys and its current and planned interoperability with biodiversity data portals and networks. Details are given on the underlying architectural model, which implements the W3C OpenAnnotation model and allows the adaptation of AnnoSys to different problem domains. Advantages and disadvantages of different digital annotation and feedback approaches are discussed. For the biodiversity domain, AnnoSys proposes best practice procedures for digital annotations of complex records. Database URL: https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys PMID:28365735
Research resources: curating the new eagle-i discovery system
Vasilevsky, Nicole; Johnson, Tenille; Corday, Karen; Torniai, Carlo; Brush, Matthew; Segerdell, Erik; Wilson, Melanie; Shaffer, Chris; Robinson, David; Haendel, Melissa
2012-01-01
Development of biocuration processes and guidelines for new data types or projects is a challenging task. Each project finds its way toward defining annotation standards and ensuring data consistency with varying degrees of planning and different tools to support and/or report on consistency. Further, this process may be data type specific even within the context of a single project. This article describes our experiences with eagle-i, a 2-year pilot project to develop a federated network of data repositories in which unpublished, unshared or otherwise ‘invisible’ scientific resources could be inventoried and made accessible to the scientific community. During the course of eagle-i development, the main challenges we experienced related to the difficulty of collecting and curating data while the system and the data model were simultaneously built, and a deficiency and diversity of data management strategies in the laboratories from which the source data was obtained. We discuss our approach to biocuration and the importance of improving information management strategies to the research process, specifically with regard to the inventorying and usage of research resources. Finally, we highlight the commonalities and differences between eagle-i and similar efforts with the hope that our lessons learned will assist other biocuration endeavors. Database URL: www.eagle-i.net PMID:22434835
Recent Research in Science Teaching and Learning
ERIC Educational Resources Information Center
Dolan, Erin
2010-01-01
This feature is designed to point readers of this journal to current articles of interest in life sciences education as well as more general and noteworthy publications in education research. URLs are provided for the abstracts or full text of articles. For articles listed as "Abstract available," full text may be accessible at the indicated URL…
OpenFluDB, a database for human and animal influenza virus
Liechti, Robin; Gleizes, Anne; Kuznetsov, Dmitry; Bougueleret, Lydie; Le Mercier, Philippe; Bairoch, Amos; Xenarios, Ioannis
2010-01-01
Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27 000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch. PMID:20624713
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan
Kinjo, Akira R.; Yamashita, Reiko; Nakamura, Haruki
2010-01-01
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/ PMID:20798081
PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan.
Kinjo, Akira R; Yamashita, Reiko; Nakamura, Haruki
2010-08-25
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
novPTMenzy: a database for enzymes involved in novel post-translational modifications
Khater, Shradha; Mohanty, Debasisa
2015-01-01
With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459
Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin
2011-01-01
The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.
Meyer, Michael J; Geske, Philip; Yu, Haiyuan
2016-05-15
Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Inter-disciplinary Interactions in Underground Laboratories
NASA Astrophysics Data System (ADS)
Wang, J. S.; Bettini, A.
2010-12-01
Many of underground facilities, ranging from simple cavities to fully equipped laboratories, have been established worldwide (1) to evaluate the impacts of emplacing nuclear wastes in underground research laboratories (URLs) and (2) to measure rare physics events in deep underground laboratories (DULs). In this presentation, we compare similarities and differences between URLs and DULs in focus of site characterization, in quantification of quietness, and in improvement of signal to noise ratios. The nuclear waste URLs are located primarily in geological medium with potentials for slow flow/transport and long isolation. The URL medium include plastic salt, hard rock, soft clay, volcanic tuff, basalt and shale, at over ~500 m where waste repositories are envisioned to be excavated. The majority of URLs are dedicated facilities excavated after extensive site characterization. The focuses are on fracture distributions, heterogeneity, scaling, coupled processes, and other fundamental issues of earth sciences. For the physics DULs, the depth/overburden thickness is the main parameter that determines the damping of cosmic rays, and that, consequently, should be larger than, typically, 800m. Radioactivity from rocks, neutron flux, and radon gas, depending on local rock and ventilation conditions (largely independent of depth), are also characterized at different sites to quantify the background level for physics experiments. DULs have been constructed by excavating dedicated experimental halls and service cavities near to a road tunnel (horizontal access) or in a mine (vertical access). Cavities at shallower depths are suitable for experiments on neutrinos from artificial source, power reactors or accelerators. Rocks stability (depth dependent), safe access, and utility supply are among factors of main concerns for DULs. While the focuses and missions of URLs and DULs are very different, common experience and lessons learned may be useful for ongoing development of new facilities needed for next generation of underground assessments and experiments. There are growing interests in developing multi-disciplinary programs in DULs and some URLs have rooms set aside for physics experiments. Examples of DULs and URLs with interactions between earth sciences and physics include Gran Sasso in Italy, Kaimioka in Japan, Canfranc in Spain, LSBB in France, WIPP in New Mexico, DUSEL in South Dakota, and Jing Ping deep tunnel underground laboratory proposal in China. Instruments of common interests include interferometers, laser strain meters, seismic networks, tiltmeters, gravimeters, magnetometers, and other sensors to detect signals over different frequencies and water chemical analyses, including radon concentrations. Radon emissions are of concern for physics experiments and are studied as possible precursors of earthquakes. Measuring geoneutrino flux and energy spectrum in different locations is of interests to both physics and earth sciences. The contributions of U and Th in the crust and the mantle to the energy production in the Earth can be studied. One final note is that our ongoing reviews are aimed to contribute to technological innovations anticipated through inter-disciplinary interactions.
Deciding to Change OpenURL Link Resolvers
ERIC Educational Resources Information Center
Johnson, Megan; Leonard, Andrea; Wiswell, John
2015-01-01
This article will be of interest to librarians, particularly those in consortia that are evaluating OpenURL link resolvers. This case study contrasts WebBridge (an Innovative Interface product) and LinkSource (EBSCO's product). This study assisted us in the decision-making process of choosing an OpenURL link resolver that was sustainable to…
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Quality assurance for the query and distribution systems of the RCSB Protein Data Bank
Bluhm, Wolfgang F.; Beran, Bojan; Bi, Chunxiao; Dimitropoulos, Dimitris; Prlić, Andreas; Quinn, Gregory B.; Rose, Peter W.; Shah, Chaitali; Young, Jasmine; Yukich, Benjamin; Berman, Helen M.; Bourne, Philip E.
2011-01-01
The RCSB Protein Data Bank (RCSB PDB, www.pdb.org) is a key online resource for structural biology and related scientific disciplines. The website is used on average by 165 000 unique visitors per month, and more than 2000 other websites link to it. The amount and complexity of PDB data as well as the expectations on its usage are growing rapidly. Therefore, ensuring the reliability and robustness of the RCSB PDB query and distribution systems are crucially important and increasingly challenging. This article describes quality assurance for the RCSB PDB website at several distinct levels, including: (i) hardware redundancy and failover, (ii) testing protocols for weekly database updates, (iii) testing and release procedures for major software updates and (iv) miscellaneous monitoring and troubleshooting tools and practices. As such it provides suggestions for how other websites might be operated. Database URL: www.pdb.org PMID:21382834
Diamond Eye: a distributed architecture for image data mining
NASA Astrophysics Data System (ADS)
Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem
1999-02-01
Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.
Front-Row Seat at the IPY: The Field Notes Electronic Newsletter
NASA Astrophysics Data System (ADS)
Rithner, P. K.; Zager, S. D.; Garcia-Lavigne, D. N.
2007-12-01
As employees of Polar Field Services/VPR, the arctic logistics provider to the US National Science Foundation, we bear witness to the exploration, documentation, and celebration of the International Polar Year (IPY). Our front- row vantage point (logisticians working with field scientists) offers us a rare opportunity to report on developments at the frontiers of polar research and to describe how scientists work in the Arctic. Our reporting mechanism is field notes, a weekly (summer) to monthly (winter) electronic digest of information about the IPY research we support. Each issue showcases a short "cover" piece highlighting science projects or profiling arctic program participants. In addition, field notes offers news updates, short interviews, and blog-style dispatches contributed by researchers and support personnel. Wherever possible, we include URLs so readers may find more information via the Web: we link to an online database of projects we maintain for the NSF, to university Web sites, project blogs, and so on. We aim to inform the interested layperson about the myriad of activity in the IPY. We like to show that arctic science is interesting, relevant--and a great adventure. We've found field notes to be an excellent outreach venue. By no means a slick media outlet, field notes is published "on the side" by a small but dedicated group of employees who are endlessly fascinated by, and who enjoy an engaging perspective on, contemporary arctic research. Newsletter
Gaby, John Christian; Buckley, Daniel H
2014-01-01
We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
Gaby, John Christian; Buckley, Daniel H.
2014-01-01
We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis
Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy
2015-01-01
We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660
Website Feedback | Center for Cancer Research
Thank you for providing feedback about the CCR website. There are 5 comment areas available via this webform, but it may be submitted as often as needed. Whenever possible please be specific - give the url of the page and details about your concern.
48 CFR 1802.101 - Definitions.
Code of Federal Regulations, 2014 CFR
2014-10-01
... Acquisition Internet Service (NAIS) means the Internet service (URL: hhtp://procurement.nasa.gov) NASA uses to... Administrator or Deputy Administrator of NASA. Contracting activity in NASA includes the NASA Headquarters installation, the NASA Shared Services Center, and the following field installations: Ames Research Center...
48 CFR 1802.101 - Definitions.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Acquisition Internet Service (NAIS) means the Internet service (URL: hhtp://procurement.nasa.gov) NASA uses to... Administrator or Deputy Administrator of NASA. Contracting activity in NASA includes the NASA Headquarters installation, the NASA Shared Services Center, and the following field installations: Ames Research Center...
48 CFR 1802.101 - Definitions.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Acquisition Internet Service (NAIS) means the Internet service (URL: hhtp://procurement.nasa.gov) NASA uses to... Administrator or Deputy Administrator of NASA. Contracting activity in NASA includes the NASA Headquarters installation, the NASA Shared Services Center, and the following field installations: Ames Research Center...
48 CFR 1802.101 - Definitions.
Code of Federal Regulations, 2013 CFR
2013-10-01
... Acquisition Internet Service (NAIS) means the Internet service (URL: hhtp://procurement.nasa.gov) NASA uses to... Administrator or Deputy Administrator of NASA. Contracting activity in NASA includes the NASA Headquarters installation, the NASA Shared Services Center, and the following field installations: Ames Research Center...
48 CFR 1802.101 - Definitions.
Code of Federal Regulations, 2012 CFR
2012-10-01
... Acquisition Internet Service (NAIS) means the Internet service (URL: hhtp://procurement.nasa.gov) NASA uses to... Administrator or Deputy Administrator of NASA. Contracting activity in NASA includes the NASA Headquarters installation, the NASA Shared Services Center, and the following field installations: Ames Research Center...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-19
... Inventory.'' The following should be changed: The notice provided an incorrect URL address: http://www.hhs.gov/grants/servicecontractsfy11.html . The correct URL address is as follows: http://www.hhs.gov... following URL address: http://www.hhs.gov/grants/servicecontractsfy11.html . Change the fiscal year to FY...
Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie
2017-01-01
Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/
PROTICdb: a web-based application to store, track, query, and compare plant proteome data.
Ferry-Dumazet, Hélène; Houel, Gwenn; Montalent, Pierre; Moreau, Luc; Langella, Olivier; Negroni, Luc; Vincent, Delphine; Lalanne, Céline; de Daruvar, Antoine; Plomion, Christophe; Zivy, Michel; Joets, Johann
2005-05-01
PROTICdb is a web-based application, mainly designed to store and analyze plant proteome data obtained by two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and mass spectrometry (MS). The purposes of PROTICdb are (i) to store, track, and query information related to proteomic experiments, i.e., from tissue sampling to protein identification and quantitative measurements, and (ii) to integrate information from the user's own expertise and other sources into a knowledge base, used to support data interpretation (e.g., for the determination of allelic variants or products of post-translational modifications). Data insertion into the relational database of PROTICdb is achieved either by uploading outputs of image analysis and MS identification software, or by filling web forms. 2-D PAGE annotated maps can be displayed, queried, and compared through a graphical interface. Links to external databases are also available. Quantitative data can be easily exported in a tabulated format for statistical analyses. PROTICdb is based on the Oracle or the PostgreSQL Database Management System and is freely available upon request at the following URL: http://moulon.inra.fr/ bioinfo/PROTICdb.
Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.
2015-01-01
Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
Lee, Ji-Hyun; You, Sungyong; Hyeon, Do Young; Kang, Byeongsoo; Kim, Hyerim; Park, Kyoung Mii; Han, Byungwoo; Hwang, Daehee; Kim, Sunghoon
2015-01-01
Mammalian cells have cytoplasmic and mitochondrial aminoacyl-tRNA synthetases (ARSs) that catalyze aminoacylation of tRNAs during protein synthesis. Despite their housekeeping functions in protein synthesis, recently, ARSs and ARS-interacting multifunctional proteins (AIMPs) have been shown to play important roles in disease pathogenesis through their interactions with disease-related molecules. However, there are lacks of data resources and analytical tools that can be used to examine disease associations of ARS/AIMPs. Here, we developed an Integrated Database for ARSs (IDA), a resource database including cancer genomic/proteomic and interaction data of ARS/AIMPs. IDA includes mRNA expression, somatic mutation, copy number variation and phosphorylation data of ARS/AIMPs and their interacting proteins in various cancers. IDA further includes an array of analytical tools for exploration of disease association of ARS/AIMPs, identification of disease-associated ARS/AIMP interactors and reconstruction of ARS-dependent disease-perturbed network models. Therefore, IDA provides both comprehensive data resources and analytical tools for understanding potential roles of ARS/AIMPs in cancers. Database URL: http://ida.biocon.re.kr/, http://ars.biocon.re.kr/ PMID:25824651
Ran, Xia; Cai, Wei-Jun; Huang, Xiu-Feng; Liu, Qi; Lu, Fan; Qu, Jia; Wu, Jinyu; Jin, Zi-Bing
2014-01-01
Inherited retinal degeneration (IRD), a leading cause of human blindness worldwide, is exceptionally heterogeneous with clinical heterogeneity and genetic variety. During the past decades, tremendous efforts have been made to explore the complex heterogeneity, and massive mutations have been identified in different genes underlying IRD with the significant advancement of sequencing technology. In this study, we developed a comprehensive database, 'RetinoGenetics', which contains informative knowledge about all known IRD-related genes and mutations for IRD. 'RetinoGenetics' currently contains 4270 mutations in 186 genes, with detailed information associated with 164 phenotypes from 934 publications and various types of functional annotations. Then extensive annotations were performed to each gene using various resources, including Gene Ontology, KEGG pathways, protein-protein interaction, mutational annotations and gene-disease network. Furthermore, by using the search functions, convenient browsing ways and intuitive graphical displays, 'RetinoGenetics' could serve as a valuable resource for unveiling the genetic basis of IRD. Taken together, 'RetinoGenetics' is an integrative, informative and updatable resource for IRD-related genetic predispositions. Database URL: http://www.retinogenetics.org/. © The Author(s) 2014. Published by Oxford University Press.
The Virtual Xenbase: transitioning an online bioinformatics resource to a private cloud
Karimi, Kamran; Vize, Peter D.
2014-01-01
As a model organism database, Xenbase has been providing informatics and genomic data on Xenopus (Silurana) tropicalis and Xenopus laevis frogs for more than a decade. The Xenbase database contains curated, as well as community-contributed and automatically harvested literature, gene and genomic data. A GBrowse genome browser, a BLAST+ server and stock center support are available on the site. When this resource was first built, all software services and components in Xenbase ran on a single physical server, with inherent reliability, scalability and inter-dependence issues. Recent advances in networking and virtualization techniques allowed us to move Xenbase to a virtual environment, and more specifically to a private cloud. To do so we decoupled the different software services and components, such that each would run on a different virtual machine. In the process, we also upgraded many of the components. The resulting system is faster and more reliable. System maintenance is easier, as individual virtual machines can now be updated, backed up and changed independently. We are also experiencing more effective resource allocation and utilization. Database URL: www.xenbase.org PMID:25380782
The dye-sensitized solar cell database.
Venkatraman, Vishwesh; Raju, Rajesh; Oikonomopoulos, Solon P; Alsberg, Bjørn K
2018-04-03
Dye-sensitized solar cells (DSSCs) have garnered a lot of attention in recent years. The solar energy to power conversion efficiency of a DSSC is influenced by various components of the cell such as the dye, electrolyte, electrodes and additives among others leading to varying experimental configurations. A large number of metal-based and metal-free dye sensitizers have now been reported and tools using such data to indicate new directions for design and development are on the rise. DSSCDB, the first of its kind dye-sensitized solar cell database, aims to provide users with up-to-date information from publications on the molecular structures of the dyes, experimental details and reported measurements (efficiencies and spectral properties) and thereby facilitate a comprehensive and critical evaluation of the data. Currently, the DSSCDB contains over 4000 experimental observations spanning multiple dye classes such as triphenylamines, carbazoles, coumarins, phenothiazines, ruthenium and porphyrins. The DSSCDB offers a web-based, comprehensive source of property data for dye sensitized solar cells. Access to the database is available through the following URL: www.dyedb.com .
A framework for cross-observatory volcanological database management
NASA Astrophysics Data System (ADS)
Aliotta, Marco Antonio; Amore, Mauro; Cannavò, Flavio; Cassisi, Carmelo; D'Agostino, Marcello; Dolce, Mario; Mastrolia, Andrea; Mangiagli, Salvatore; Messina, Giuseppe; Montalto, Placido; Fabio Pisciotta, Antonino; Prestifilippo, Michele; Rossi, Massimo; Scarpato, Giovanni; Torrisi, Orazio
2017-04-01
In the last years, it has been clearly shown how the multiparametric approach is the winning strategy to investigate the complex dynamics of the volcanic systems. This involves the use of different sensor networks, each one dedicated to the acquisition of particular data useful for research and monitoring. The increasing interest devoted to the study of volcanological phenomena led the constitution of different research organizations or observatories, also relative to the same volcanoes, which acquire large amounts of data from sensor networks for the multiparametric monitoring. At INGV we developed a framework, hereinafter called TSDSystem (Time Series Database System), which allows to acquire data streams from several geophysical and geochemical permanent sensor networks (also represented by different data sources such as ASCII, ODBC, URL etc.), located on the main volcanic areas of Southern Italy, and relate them within a relational database management system. Furthermore, spatial data related to different dataset are managed using a GIS module for sharing and visualization purpose. The standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common space and time scale. In order to share data between INGV observatories, and also with Civil Protection, whose activity is related on the same volcanic districts, we designed a "Master View" system that, starting from the implementation of a number of instances of the TSDSystem framework (one for each observatory), makes possible the joint interrogation of data, both temporal and spatial, on instances located in different observatories, through the use of web services technology (RESTful, SOAP). Similarly, it provides metadata for equipment using standard schemas (such as FDSN StationXML). The "Master View" is also responsible for managing the data policy through a "who owns what" system, which allows you to associate viewing/download of spatial or time intervals to particular users or groups.
Subhra Das, Sankha; James, Mithun; Paul, Sandip
2017-01-01
Abstract The various pathophysiological processes occurring in living systems are known to be orchestrated by delicate interplays and cross-talks between different genes and their regulators. Among the various regulators of genes, there is a class of small non-coding RNA molecules known as microRNAs. Although, the relative simplicity of miRNAs and their ability to modulate cellular processes make them attractive therapeutic candidates, their presence in large numbers make it challenging for experimental researchers to interpret the intricacies of the molecular processes they regulate. Most of the existing bioinformatic tools fail to address these challenges. Here, we present a new web resource ‘miRnalyze’ that has been specifically designed to directly identify the putative regulation of cell signaling pathways by miRNAs. The tool integrates miRNA-target predictions with signaling cascade members by utilizing TargetScanHuman 7.1 miRNA-target prediction tool and the KEGG pathway database, and thus provides researchers with in-depth insights into modulation of signal transduction pathways by miRNAs. miRnalyze is capable of identifying common miRNAs targeting more than one gene in the same signaling pathway—a feature that further increases the probability of modulating the pathway and downstream reactions when using miRNA modulators. Additionally, miRnalyze can sort miRNAs according to the seed-match types and TargetScan Context ++ score, thus providing a hierarchical list of most valuable miRNAs. Furthermore, in order to provide users with comprehensive information regarding miRNAs, genes and pathways, miRnalyze also links to expression data of miRNAs (miRmine) and genes (TiGER) and proteome abundance (PaxDb) data. To validate the capability of the tool, we have documented the correlation of miRnalyze’s prediction with experimental confirmation studies. Database URL: http://www.mirnalyze.in PMID:28365733
CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002.
Yang, Yaohua; Feng, Jie; Li, Tao; Ge, Feng; Zhao, Jindong
2015-01-01
Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics's usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics. © The Author(s) 2015. Published by Oxford University Press.
CardioTF, a database of deconstructing transcriptional circuits in the heart system
2016-01-01
Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320
CardioTF, a database of deconstructing transcriptional circuits in the heart system.
Zhen, Yisong
2016-01-01
Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.
1961-06-30
Image L61-4369 is available as an electronic file from the photo lab. See URL. -- Photographed on 06/30/1961. -- Test of parawing in Full Scale Wind Tunnel. -- Published in James R. Hansen, Spaceflight Revolution: NASA Langley Research Center From Sputnik to Apollo, (Washington: NASA, 1995), pp. 380-387.
Uploaded datasets are detailed exposure information (chemical concentrations and water quality parameters) for exposures conducted in a flow through diluter system with larval Pimephales promelas to four different pyrethroid pesticides. The GEO submission URL links to the NCBI GEO database and contains gene expression data from whole larvae exposed to different concentrations of the pyrethroids across multiple experiments.This dataset is associated with the following publication:Biales, A., M. Kostich, A. Batt, M. See, R. Flick, D. Gordon, J. Lazorchak, and D. Bencic. Initial Development of a Multigene Omics-Based Exposure Biomarker for Pyrethroid Pesticides. CRITICAL REVIEWS IN ENVIRONMENTAL SCIENCE AND TECHNOLOGY. CRC Press LLC, Boca Raton, FL, USA, 179(0): 27-35, (2016).
Disappearing act: decay of uniform resource locators in health care management journals
Wagner, Cassie; Gebremichael, Meseret D.; Soltys, Michael J.
2009-01-01
Objectives: This study examines the problem of decay of uniform resource locators (URLs) in health care management journals and seeks to determine whether continued availability at a given URL relates to the date of publication, the type of resource, or the top-level URL domain. Methods: The authors determined the availability of web-based resources cited in articles published in five source journals from 2002 to 2004. The data were analyzed using correlation, chi-square, and descriptive statistics. Attempts were made to locate the unavailable resources. Results: After checking twice, 49.3% of the original 2,011 cited resources could not be located at the cited URL. The older the article, the more likely that URLs in the reference list of that article were inactive (r = −0.62, P<0.001, n = 1,968). There was no difference in availability across resource types (χ2 = 5.28, df = 2, P = 0.07, n = 1,786). Whether an URL was active varied by top-level domain (χ2 = 14.92, df = 4, P = 0.00, n = 1,786). Conclusions: URL decay is a serious problem in health care management journals. In addition to using website archiving tools like WebCite, publishers should require authors to both keep copies of Internet-based information they used and deposit copies of data with the publishers. PMID:19404503
R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.
Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès
2016-01-01
Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom. Database URL: http://www.rsyst.inra.fr/. © The Author(s) 2016. Published by Oxford University Press.
R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring
Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès
2016-01-01
Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom. Database URL: http://www.rsyst.inra.fr/ PMID:26989149
Li, Zhao; Li, Jin; Yu, Peng
2018-01-01
Abstract Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org. Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration PMID:29688376
PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein-protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder--a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. DATABASE URL: http://www.biomining-bu.in/ppinterfinder/
PPInterFinder—a mining tool for extracting causal relations on human proteins from literature
Raja, Kalpana; Subramani, Suresh; Natarajan, Jeyakumar
2013-01-01
One of the most common and challenging problem in biomedical text mining is to mine protein–protein interactions (PPIs) from MEDLINE abstracts and full-text research articles because PPIs play a major role in understanding the various biological processes and the impact of proteins in diseases. We implemented, PPInterFinder—a web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair. We find that PPInterFinder is capable of predicting PPIs with the accuracy of 66.05% on AIMED corpus and outperforms most of the existing systems. Database URL: http://www.biomining-bu.in/ppinterfinder/ PMID:23325628
Leading Factors Determining Lateral Transfer Success
2013-03-01
leadership. Officers in the URL communities, and especially SWO JOs, are overworked and lack the resources to succeed. Their morale is low and they are...transfer process and potential community health including end strength in the RL and Staff Corps communities. This line of research is important
Extracting and connecting chemical structures from text sources using chemicalize.org.
Southan, Christopher; Stracz, Andras
2013-04-23
Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and databases, including structure searches in PubChem, InChIKey searches in Google and the chemicalize.org archive. It has the flexibility to extract text from any internal, external or Web source. It synergizes with other open tools and the application is undergoing continued development. It should thus facilitate progress in medicinal chemistry, chemical biology and other bioactive chemistry domains.
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.
The Eclipsing Binary On-Line Atlas (EBOLA)
NASA Astrophysics Data System (ADS)
Bradstreet, D. H.; Steelman, D. P.; Sanders, S. J.; Hargis, J. R.
2004-05-01
In conjunction with the upcoming release of \\it Binary Maker 3.0, an extensive on-line database of eclipsing binaries is being made available. The purposes of the atlas are: \\begin {enumerate} Allow quick and easy access to information on published eclipsing binaries. Amass a consistent database of light and radial velocity curve solutions to aid in solving new systems. Provide invaluable querying capabilities on all of the parameters of the systems so that informative research can be quickly accomplished on a multitude of published results. Aid observers in establishing new observing programs based upon stars needing new light and/or radial velocity curves. Encourage workers to submit their published results so that others may have easy access to their work. Provide a vast but easily accessible storehouse of information on eclipsing binaries to accelerate the process of understanding analysis techniques and current work in the field. \\end {enumerate} The database will eventually consist of all published eclipsing binaries with light curve solutions. The following information and data will be supplied whenever available for each binary: original light curves in all bandpasses, original radial velocity observations, light curve parameters, RA and Dec, V-magnitudes, spectral types, color indices, periods, binary type, 3D representation of the system near quadrature, plots of the original light curves and synthetic models, plots of the radial velocity observations with theoretical models, and \\it Binary Maker 3.0 data files (parameter, light curve, radial velocity). The pertinent references for each star are also given with hyperlinks directly to the papers via the NASA Abstract website for downloading, if available. In addition the Atlas has extensive searching options so that workers can specifically search for binaries with specific characteristics. The website has more than 150 systems already uploaded. The URL for the site is http://ebola.eastern.edu/.
Rothwell, Joseph A.; Perez-Jimenez, Jara; Neveu, Vanessa; Medina-Remón, Alexander; M'Hiri, Nouha; García-Lobato, Paula; Manach, Claudine; Knox, Craig; Eisner, Roman; Wishart, David S.; Scalbert, Augustin
2013-01-01
Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys. Database URL: http://www.phenol-explorer.eu PMID:24103452
Henrique, Tiago; José Freitas da Silveira, Nelson; Henrique Cunha Volpato, Arthur; Mioto, Mayra Mataruco; Carolina Buzzo Stefanini, Ana; Bachir Fares, Adil; Gustavo da Silva Castro Andrade, João; Masson, Carolina; Verónica Mendoza López, Rossana; Daumas Nunes, Fabio; Paulo Kowalski, Luis; Severino, Patricia; Tajara, Eloiza Helena
2016-01-01
The total amount of scientific literature has grown rapidly in recent years. Specifically, there are several million citations in the field of cancer. This makes it difficult, if not impossible, to manually retrieve relevant information on the mechanisms that govern tumor behavior or the neoplastic process. Furthermore, cancer is a complex disease or, more accurately, a set of diseases. The heterogeneity that permeates many tumors is particularly evident in head and neck (HN) cancer, one of the most common types of cancer worldwide. In this study, we present HNdb, a free database that aims to provide a unified and comprehensive resource of information on genes and proteins involved in HN squamous cell carcinoma, covering data on genomics, transcriptomics, proteomics, literature citations and also cross-references of external databases. Different literature searches of MEDLINE abstracts were performed using specific Medical Subject Headings (MeSH terms) for oral, oropharyngeal, hypopharyngeal and laryngeal squamous cell carcinomas. A curated gene-to-publication assignment yielded a total of 1370 genes related to HN cancer. The diversity of results allowed identifying novel and mostly unexplored gene associations, revealing, for example, that processes linked to response to steroid hormone stimulus are significantly enriched in genes related to HN carcinomas. Thus, our database expands the possibilities for gene networks investigation, providing potential hypothesis to be tested. Database URL: http://www.gencapo.famerp.br/hndb PMID:27013077
A New Interface for the Magnetics Information Consortium (MagIC) Paleo and Rock Magnetic Database
NASA Astrophysics Data System (ADS)
Jarboe, N.; Minnett, R.; Koppers, A. A. P.; Tauxe, L.; Constable, C.; Shaar, R.; Jonestrask, L.
2014-12-01
The Magnetic Information Consortium (MagIC) database (http://earthref.org/MagIC/) continues to improve the ease of uploading data, the creation of complex searches, data visualization, and data downloads for the paleomagnetic, geomagnetic, and rock magnetic communities. Data uploading has been simplified and no longer requires the use of the Excel SmartBook interface. Instead, properly formatted MagIC text files can be dragged-and-dropped onto an HTML 5 web interface. Data can be uploaded one table at a time to facilitate ease of uploading and data error checking is done online on the whole dataset at once instead of incrementally in an Excel Console. Searching the database has improved with the addition of more sophisticated search parameters and with the ability to use them in complex combinations. Searches may also be saved as permanent URLs for easy reference or for use as a citation in a publication. Data visualization plots (ARAI, equal area, demagnetization, Zijderveld, etc.) are presented with the data when appropriate to aid the user in understanding the dataset. Data from the MagIC database may be downloaded from individual contributions or from online searches for offline use and analysis in the tab delimited MagIC text file format. With input from the paleomagnetic, geomagnetic, and rock magnetic communities, the MagIC database will continue to improve as a data warehouse and resource.
You, Leiming; Wu, Jiexin; Feng, Yuchao; Fu, Yonggui; Guo, Yanan; Long, Liyuan; Zhang, Hui; Luan, Yijie; Tian, Peng; Chen, Liangfu; Huang, Guangrui; Huang, Shengfeng; Li, Yuxin; Li, Jie; Chen, Chengyong; Zhang, Yaqing; Chen, Shangwu; Xu, Anlong
2015-01-01
Increasing amounts of genes have been shown to utilize alternative polyadenylation (APA) 3′-processing sites depending on the cell and tissue type and/or physiological and pathological conditions at the time of processing, and the construction of genome-wide database regarding APA is urgently needed for better understanding poly(A) site selection and APA-directed gene expression regulation for a given biology. Here we present a web-accessible database, named APASdb (http://mosas.sysu.edu.cn/utr), which can visualize the precise map and usage quantification of different APA isoforms for all genes. The datasets are deeply profiled by the sequencing alternative polyadenylation sites (SAPAS) method capable of high-throughput sequencing 3′-ends of polyadenylated transcripts. Thus, APASdb details all the heterogeneous cleavage sites downstream of poly(A) signals, and maintains near complete coverage for APA sites, much better than the previous databases using conventional methods. Furthermore, APASdb provides the quantification of a given APA variant among transcripts with different APA sites by computing their corresponding normalized-reads, making our database more useful. In addition, APASdb supports URL-based retrieval, browsing and display of exon-intron structure, poly(A) signals, poly(A) sites location and usage reads, and 3′-untranslated regions (3′-UTRs). Currently, APASdb involves APA in various biological processes and diseases in human, mouse and zebrafish. PMID:25378337
Integrated sequence and immunology filovirus database at Los Alamos
Yoon, Hyejin; Foley, Brian; Feng, Shihai; Macke, Jennifer; Dimitrijevic, Mira; Abfalterer, Werner; Szinger, James; Fischer, Will; Kuiken, Carla; Korber, Bette
2016-01-01
The Ebola outbreak of 2013–15 infected more than 28 000 people and claimed more lives than all previous filovirus outbreaks combined. Governmental agencies, clinical teams, and the world scientific community pulled together in a multifaceted response ranging from prevention and disease control, to evaluating vaccines and therapeutics in human trials. As this epidemic is finally coming to a close, refocusing on long-term prevention strategies becomes paramount. Given the very real threat of future filovirus outbreaks, and the inherent uncertainty of the next outbreak virus and geographic location, it is prudent to consider the extent and implications of known natural diversity in advancing vaccines and therapeutic approaches. To facilitate such consideration, we have updated and enhanced the content of the filovirus portion of Los Alamos Hemorrhagic Fever Viruses Database. We have integrated and performed baseline analysis of all family Filoviridae sequences deposited into GenBank, with associated immune response data, and metadata, and we have added new computational tools with web-interfaces to assist users with analysis. Here, we (i) describe the main features of updated database, (ii) provide integrated views and some basic analyses summarizing evolutionary patterns as they relate to geo-temporal data captured in the database and (iii) highlight the most conserved regions in the proteome that may be useful for a T cell vaccine strategy. Database URL: www.hfv.lanl.gov PMID:27103629
An Ebola virus-centered knowledge base
Kamdar, Maulik R.; Dumontier, Michel
2015-01-01
Ebola virus (EBOV), of the family Filoviridae viruses, is a NIAID category A, lethal human pathogen. It is responsible for causing Ebola virus disease (EVD) that is a severe hemorrhagic fever and has a cumulative death rate of 41% in the ongoing epidemic in West Africa. There is an ever-increasing need to consolidate and make available all the knowledge that we possess on EBOV, even if it is conflicting or incomplete. This would enable biomedical researchers to understand the molecular mechanisms underlying this disease and help develop tools for efficient diagnosis and effective treatment. In this article, we present our approach for the development of an Ebola virus-centered Knowledge Base (Ebola-KB) using Linked Data and Semantic Web Technologies. We retrieve and aggregate knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 Endpoint. Ebola-KB can also be explored using an interactive Dashboard visualizing the different perspectives of this integrated knowledge. We showcase how different competency questions, asked by domain users researching the druggability of EBOV, can be formulated as SPARQL Queries or answered using the Ebola-KB Dashboard. Database URL: http://ebola.semanticscience.org. PMID:26055098
BioSurfDB: knowledge and algorithms to support biosurfactants and biodegradation studies
Oliveira, Jorge S.; Araújo, Wydemberg; Lopes Sales, Ana Isabela; de Brito Guerra, Alaine; da Silva Araújo, Sinara Carla; de Vasconcelos, Ana Tereza Ribeiro; Agnez-Lima, Lucymara F.; Freitas, Ana Teresa
2015-01-01
Crude oil extraction, transportation and use provoke the contamination of countless ecosystems. Therefore, bioremediation through surfactants mobilization or biodegradation is an important subject, both economically and environmentally. Bioremediation research had a great boost with the recent advances in Metagenomics, as it enabled the sequencing of uncultured microorganisms providing new insights on surfactant-producing and/or oil-degrading bacteria. Many research studies are making available genomic data from unknown organisms obtained from metagenomics analysis of oil-contaminated environmental samples. These new datasets are presently demanding the development of new tools and data repositories tailored for the biological analysis in a context of bioremediation data analysis. This work presents BioSurfDB, www.biosurfdb.org, a curated relational information system integrating data from: (i) metagenomes; (ii) organisms; (iii) biodegradation relevant genes; proteins and their metabolic pathways; (iv) bioremediation experiments results, with specific pollutants treatment efficiencies by surfactant producing organisms; and (v) a biosurfactant-curated list, grouped by producing organism, surfactant name, class and reference. The main goal of this repository is to gather information on the characterization of biological compounds and mechanisms involved in biosurfactant production and/or biodegradation and make it available in a curated way and associated with a number of computational tools to support studies of genomic and metagenomic data. Database URL: www.biosurfdb.org PMID:25833955
Damming the genomic data flood using a comprehensive analysis and storage data structure
Bouffard, Marc; Phillips, Michael S.; Brown, Andrew M.K.; Marsh, Sharon; Tardif, Jean-Claude; van Rooij, Tibor
2010-01-01
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca PMID:21159730
The systematic annotation of the three main GPCR families in Reactome.
Jassal, Bijay; Jupe, Steven; Caudy, Michael; Birney, Ewan; Stein, Lincoln; Hermjakob, Henning; D'Eustachio, Peter
2010-07-29
Reactome is an open-source, freely available database of human biological pathways and processes. A major goal of our work is to provide an integrated view of cellular signalling processes that spans from ligand-receptor interactions to molecular readouts at the level of metabolic and transcriptional events. To this end, we have built the first catalogue of all human G protein-coupled receptors (GPCRs) known to bind endogenous or natural ligands. The UniProt database has records for 797 proteins classified as GPCRs and sorted into families A/1, B/2 and C/3 on the basis of amino acid sequence. To these records we have added details from the IUPHAR database and our own manual curation of relevant literature to create reactions in which 563 GPCRs bind ligands and also interact with specific G-proteins to initiate signalling cascades. We believe the remaining 234 GPCRs are true orphans. The Reactome GPCR pathway can be viewed as a detailed interactive diagram and can be exported in many forms. It provides a template for the orthology-based inference of GPCR reactions for diverse model organism species, and can be overlaid with protein-protein interaction and gene expression datasets to facilitate overrepresentation studies and other forms of pathway analysis. Database URL: http://www.reactome.org.
MGIS: managing banana (Musa spp.) genetic resources information and high-throughput genotyping data
Guignon, V.; Sempere, G.; Sardos, J.; Hueber, Y.; Duvergey, H.; Andrieu, A.; Chase, R.; Jenny, C.; Hazekamp, T.; Irish, B.; Jelali, K.; Adeka, J.; Ayala-Silva, T.; Chao, C.P.; Daniells, J.; Dowiya, B.; Effa effa, B.; Gueco, L.; Herradura, L.; Ibobondji, L.; Kempenaers, E.; Kilangi, J.; Muhangi, S.; Ngo Xuan, P.; Paofa, J.; Pavis, C.; Thiemele, D.; Tossou, C.; Sandoval, J.; Sutanto, A.; Vangu Paka, G.; Yi, G.; Van den houwe, I.; Roux, N.
2017-01-01
Abstract Unraveling the genetic diversity held in genebanks on a large scale is underway, due to advances in Next-generation sequence (NGS) based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm from the global genepool based on a combination of passport, genotypic and phenotypic data. To facilitate this, a new generation of information systems is being designed to efficiently handle data and link it with other external resources such as genome or breeding databases. The Musa Germplasm Information System (MGIS), the database for global ex situ-held banana genetic resources, has been developed to address those needs in a user-friendly way. In developing MGIS, we selected a generic database schema (Chado), the robust content management system Drupal for the user interface, and Tripal, a set of Drupal modules which links the Chado schema to Drupal. MGIS allows germplasm collection examination, accession browsing, advanced search functions, and germplasm orders. Additionally, we developed unique graphical interfaces to compare accessions and to explore them based on their taxonomic information. Accession-based data has been enriched with publications, genotyping studies and associated genotyping datasets reporting on germplasm use. Finally, an interoperability layer has been implemented to facilitate the link with complementary databases like the Banana Genome Hub and the MusaBase breeding database. Database URL: https://www.crop-diversity.org/mgis/ PMID:29220435
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.
Li, Jiao; Sun, Yueping; Johnson, Robin J; Sciaky, Daniela; Wei, Chih-Hsuan; Leaman, Robert; Davis, Allan Peter; Mattingly, Carolyn J; Wiegers, Thomas C; Lu, Zhiyong
2016-01-01
Community-run, formal evaluations and manually annotated text corpora are critically important for advancing biomedical text-mining research. Recently in BioCreative V, a new challenge was organized for the tasks of disease named entity recognition (DNER) and chemical-induced disease (CID) relation extraction. Given the nature of both tasks, a test collection is required to contain both disease/chemical annotations and relation annotations in the same set of articles. Despite previous efforts in biomedical corpus construction, none was found to be sufficient for the task. Thus, we developed our own corpus called BC5CDR during the challenge by inviting a team of Medical Subject Headings (MeSH) indexers for disease/chemical entity annotation and Comparative Toxicogenomics Database (CTD) curators for CID relation annotation. To ensure high annotation quality and productivity, detailed annotation guidelines and automatic annotation tools were provided. The resulting BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions. Each entity annotation includes both the mention text spans and normalized concept identifiers, using MeSH as the controlled vocabulary. To ensure accuracy, the entities were first captured independently by two annotators followed by a consensus annotation: The average inter-annotator agreement (IAA) scores were 87.49% and 96.05% for the disease and chemicals, respectively, in the test set according to the Jaccard similarity coefficient. Our corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
CycADS: an annotation database system to ease the development and update of BioCyc databases
Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano
2011-01-01
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551
International Collaboration Activities on Engineered Barrier Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jove-Colon, Carlos F.
The Used Fuel Disposition Campaign (UFDC) within the DOE Fuel Cycle Technologies (FCT) program has been engaging in international collaborations between repository R&D programs for high-level waste (HLW) disposal to leverage on gathered knowledge and laboratory/field data of near- and far-field processes from experiments at underground research laboratories (URL). Heater test experiments at URLs provide a unique opportunity to mimetically study the thermal effects of heat-generating nuclear waste in subsurface repository environments. Various configurations of these experiments have been carried out at various URLs according to the disposal design concepts of the hosting country repository program. The FEBEX (Full-scale Engineeredmore » Barrier Experiment in Crystalline Host Rock) project is a large-scale heater test experiment originated by the Spanish radioactive waste management agency (Empresa Nacional de Residuos Radiactivos S.A. – ENRESA) at the Grimsel Test Site (GTS) URL in Switzerland. The project was subsequently managed by CIEMAT. FEBEX-DP is a concerted effort of various international partners working on the evaluation of sensor data and characterization of samples obtained during the course of this field test and subsequent dismantling. The main purpose of these field-scale experiments is to evaluate feasibility for creation of an engineered barrier system (EBS) with a horizontal configuration according to the Spanish concept of deep geological disposal of high-level radioactive waste in crystalline rock. Another key aspect of this project is to improve the knowledge of coupled processes such as thermal-hydro-mechanical (THM) and thermal-hydro-chemical (THC) operating in the near-field environment. The focus of these is on model development and validation of predictions through model implementation in computational tools to simulate coupled THM and THC processes.« less
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions
Jamieson, Daniel G.; Roberts, Phoebe M.; Robertson, David L.; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie ‘pain’, a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: ••• PMID:23707966
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.
Jamieson, Daniel G; Roberts, Phoebe M; Robertson, David L; Sidders, Ben; Nenadic, Goran
2013-01-01
The vast collection of biomedical literature and its continued expansion has presented a number of challenges to researchers who require structured findings to stay abreast of and analyze molecular mechanisms relevant to their domain of interest. By structuring literature content into topic-specific machine-readable databases, the aggregate data from multiple articles can be used to infer trends that can be compared and contrasted with similar findings from topic-independent resources. Our study presents a generalized procedure for semi-automatically creating a custom topic-specific molecular interaction database through the use of text mining to assist manual curation. We apply the procedure to capture molecular events that underlie 'pain', a complex phenomenon with a large societal burden and unmet medical need. We describe how existing text mining solutions are used to build a pain-specific corpus, extract molecular events from it, add context to the extracted events and assess their relevance. The pain-specific corpus contains 765 692 documents from Medline and PubMed Central, from which we extracted 356 499 unique normalized molecular events, with 261 438 single protein events and 93 271 molecular interactions supplied by BioContext. Event chains are annotated with negation, speculation, anatomy, Gene Ontology terms, mutations, pain and disease relevance, which collectively provide detailed insight into how that event chain is associated with pain. The extracted relations are visualized in a wiki platform (wiki-pain.org) that enables efficient manual curation and exploration of the molecular mechanisms that underlie pain. Curation of 1500 grouped event chains ranked by pain relevance revealed 613 accurately extracted unique molecular interactions that in the future can be used to study the underlying mechanisms involved in pain. Our approach demonstrates that combining existing text mining tools with domain-specific terms and wiki-based visualization can facilitate rapid curation of molecular interactions to create a custom database. Database URL: •••
China in Space: Implications for U.S. Military Strategy
2007-01-01
driver. The space program provides a mechanism for research and scientific exploration that will undoubtedly advance China’s education and high...ndu.edu/ login?url=http://proquest.umi.com/pqdweb?did= 1144517361&Fmt=3&clientId=3921&RQT=309&V Name=PQD>. 4 Michael Westlake, “Space program engen
Incorporating the Internet into Traditional Library Instruction.
ERIC Educational Resources Information Center
Fonseca, Tony; King, Monica
2000-01-01
Presents a template for teaching traditional library research and one for incorporating the Web. Highlights include the differences between directories and search engines; devising search strategies; creating search terms; how to choose search engines; evaluating online resources; helpful Web sites; and how to read URLs to evaluate a Web site's…
Follicle Online: an integrated database of follicle assembly, development and ovulation.
Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua
2015-01-01
Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.
Follicle Online: an integrated database of follicle assembly, development and ovulation
Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua
2015-01-01
Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457
Report on International Collaboration Involving the FE Heater and HG-A Tests at Mont Terri
DOE Office of Scientific and Technical Information (OSTI.GOV)
Houseworth, Jim; Rutqvist, Jonny; Asahina, Daisuke
Nuclear waste programs outside of the US have focused on different host rock types for geological disposal of high-level radioactive waste. Several countries, including France, Switzerland, Belgium, and Japan are exploring the possibility of waste disposal in shale and other clay-rich rock that fall within the general classification of argillaceous rock. This rock type is also of interest for the US program because the US has extensive sedimentary basins containing large deposits of argillaceous rock. LBNL, as part of the DOE-NE Used Fuel Disposition Campaign, is collaborating on some of the underground research laboratory (URL) activities at the Mont Terrimore » URL near Saint-Ursanne, Switzerland. The Mont Terri project, which began in 1995, has developed a URL at a depth of about 300 m in a stiff clay formation called the Opalinus Clay. Our current collaboration efforts include two test modeling activities for the FE heater test and the HG-A leak-off test. This report documents results concerning our current modeling of these field tests. The overall objectives of these activities include an improved understanding of and advanced relevant modeling capabilities for EDZ evolution in clay repositories and the associated coupled processes, and to develop a technical basis for the maximum allowable temperature for a clay repository.« less
iSyTE 2.0: a database for expression-based gene discovery in the eye
Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan
2018-01-01
Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527
Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila
2018-01-01
Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374
Ienasescu, Hans; Li, Kang; Andersson, Robin; Vitezic, Morana; Rennie, Sarah; Chen, Yun; Vitting-Seerup, Kristoffer; Lagoni, Emil; Boyd, Mette; Bornholdt, Jette; de Hoon, Michiel J. L.; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Forrest, Alistair R. R.; Carninci, Piero; Sandelin, Albin
2016-01-01
Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.dk PMID:28025337
miRToolsGallery: a tag-based and rankable microRNA bioinformatics resources database portal
Chen, Liang; Heikkinen, Liisa; Wang, ChangLiang; Yang, Yang; Knott, K Emily
2018-01-01
Abstract Hundreds of bioinformatics tools have been developed for MicroRNA (miRNA) investigations including those used for identification, target prediction, structure and expression profile analysis. However, finding the correct tool for a specific application requires the tedious and laborious process of locating, downloading, testing and validating the appropriate tool from a group of nearly a thousand. In order to facilitate this process, we developed a novel database portal named miRToolsGallery. We constructed the portal by manually curating > 950 miRNA analysis tools and resources. In the portal, a query to locate the appropriate tool is expedited by being searchable, filterable and rankable. The ranking feature is vital to quickly identify and prioritize the more useful from the obscure tools. Tools are ranked via different criteria including the PageRank algorithm, date of publication, number of citations, average of votes and number of publications. miRToolsGallery provides links and data for the comprehensive collection of currently available miRNA tools with a ranking function which can be adjusted using different criteria according to specific requirements. Database URL: http://www.mirtoolsgallery.org PMID:29688355
The Virtual Xenbase: transitioning an online bioinformatics resource to a private cloud.
Karimi, Kamran; Vize, Peter D
2014-01-01
As a model organism database, Xenbase has been providing informatics and genomic data on Xenopus (Silurana) tropicalis and Xenopus laevis frogs for more than a decade. The Xenbase database contains curated, as well as community-contributed and automatically harvested literature, gene and genomic data. A GBrowse genome browser, a BLAST+ server and stock center support are available on the site. When this resource was first built, all software services and components in Xenbase ran on a single physical server, with inherent reliability, scalability and inter-dependence issues. Recent advances in networking and virtualization techniques allowed us to move Xenbase to a virtual environment, and more specifically to a private cloud. To do so we decoupled the different software services and components, such that each would run on a different virtual machine. In the process, we also upgraded many of the components. The resulting system is faster and more reliable. System maintenance is easier, as individual virtual machines can now be updated, backed up and changed independently. We are also experiencing more effective resource allocation and utilization. Database URL: www.xenbase.org. © The Author(s) 2014. Published by Oxford University Press.
The Global Genome Biodiversity Network (GGBN) Data Standard specification
Droege, G.; Barker, K.; Seberg, O.; Coddington, J.; Benson, E.; Berendsohn, W. G.; Bunk, B.; Butler, C.; Cawsey, E. M.; Deck, J.; Döring, M.; Flemons, P.; Gemeinholzer, B.; Güntsch, A.; Hollowell, T.; Kelbert, P.; Kostadinov, I.; Kottmann, R.; Lawlor, R. T.; Lyal, C.; Mackenzie-Dodds, J.; Meyer, C.; Mulcahy, D.; Nussbeck, S. Y.; O'Tuama, É.; Orrell, T.; Petersen, G.; Robertson, T.; Söhngen, C.; Whitacre, J.; Wieczorek, J.; Yilmaz, P.; Zetzsche, H.; Zhang, Y.; Zhou, X.
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard PMID:27694206
Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai
2018-01-01
Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441
Updated regulation curation model at the Saccharomyces Genome Database
Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael
2018-01-01
Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362
Fusion Genes Predict Prostate Cancer Recurrence
2017-10-01
URL for any Internet site(s) that disseminates the results of the research activities. A short description of each site should be provided. It is not...University of Wisconsin System Madison, WI 53715 REPORT DATE: October 2017 TYPE OF REPORT: Annual PREPARED FOR: U.S. Army Medical Research and...policy or decision unless so designated by other documentation. REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 Public reporting burden
Kılıç, Sefa; Sagitova, Dinara M; Wolfish, Shoshannah; Bely, Benoit; Courtot, Mélanie; Ciufo, Stacy; Tatusova, Tatiana; O'Donovan, Claire; Chibucos, Marcus C; Martin, Maria J; Erill, Ivan
2016-01-01
Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/. © The Author(s) 2016. Published by Oxford University Press.
Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja
2014-01-01
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
1983-12-01
APR 1 11984 THESIS D ,. *. ATTITUDES OF MALE UNRESTRICTED LINE (URL) OFFICERS TOWARDS INTEGRATION OF WOMEN INTO THEIR DESIGNATORS AND TOWARDS WOMEN... Male Unrestricted Line (URL) Master’s Thesis Officers Towards Integration of Women into their December, 1983 ,- Designators and Towards Women in...Integration of Women sq. AMYRACT (CtMue an rew side i moeep and Idenllifr blekn mb ) Using Rand Survey data, this thesis examines the attitudes of male Unre
Constructing Uniform Resource Locators (URLs) for Searching the Marine Realms Information Bank
Linck, Guthrie A.; Allwardt, Alan O.; Lightsom, Frances L.
2009-01-01
The Marine Realms Information Bank (MRIB) is a digital library that provides access to free online scientific information about the oceans and coastal regions. To search its collection, MRIB uses a Common Gateway Interface (CGI) program, which allows automated search requests using Uniform Resource Locators (URLs). This document provides an overview of how to construct URLs to execute MRIB queries. The parameters listed allow detailed control of which records are retrieved, how they are returned, and how their display is formatted.
CRCDA—Comprehensive resources for cancer NGS data analysis
Thangam, Manonanthini; Gopal, Ramesh Kumar
2015-01-01
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html. PMID:26450948
Gunsolus, Ian L; Jaffe, Allan S; Sexter, Anne; Schulz, Karen; Ler, Ranka; Lindgren, Brittany; Saenger, Amy K; Love, Sara A; Apple, Fred S
2017-12-01
Our purpose was to determine a) overall and sex-specific 99th percentile upper reference limits (URL) and b) influences of statistical methods and comorbidities on the URLs. Heparin plasma from 838 normal subjects (423 men, 415 women) were obtained from the AACC (Universal Sample Bank). The cobas e602 measured cTnT (Roche Gen 5 assay); limit of detection (LoD), 3ng/L. Hemoglobin A1c (URL 6.5%), NT-proBNP (URL 125ng/L) and eGFR (60mL/min/1.73m 2 ) were measured, along with identification of statin use, to better define normality. 99th percentile URLs were determined by the non-parametric (NP), Harrell-Davis Estimator (HDE) and Robust (R) methods. 355 men and 339 women remained after exclusions. Overall<50% of subjects had measureable concentrations ≥ LoD: 45.6% no exclusion, 43.5% after exclusion; compared to men: 68.1% no exclusion, 65.1% post exclusion; women: 22.7% no exclusion, 20.9% post exclusion. The statistical method used influenced URLs as follows: pre/post exclusion overall, NP 16/16ng/L, HDE 17/17ng/L, R not available; men NP 18/16ng/L, HDE 21/19ng/L, R 16/11ng/L; women NP 13/10ng/L, HDE 14/14ng/L, R not available. We demonstrated that a) the Gen 5 cTnT assay does not meet the IFCC guideline for high-sensitivity assays, b) surrogate biomarkers significantly lowers the URLs and c) statistical methods used impact URLs. Our data suggest lower sex-specific cTnT 99th percentiles than reported in the FDA approved package insert. We emphasize the importance of detailing the criteria used to include and exclude subjects for defining a healthy population and the statistical method used to calculate 99th percentiles and identify outliers. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
A Forensic Examination of Online Search Facility URL Record Structures.
Horsman, Graeme
2018-05-29
The use of search engines and associated search functions to locate content online is now common practice. As a result, a forensic examination of a suspect's online search activity can be a critical aspect in establishing whether an offense has been committed in many investigations. This article offers an analysis of online search URL structures to support law enforcement and associated digital forensics practitioners interpret acts of online searching during an investigation. Google, Bing, Yahoo!, and DuckDuckGo searching functions are examined, and key URL attribute structures and metadata have been documented. In addition, an overview of social media searching covering Twitter, Facebook, Instagram, and YouTube is offered. Results show the ability to extract embedded metadata from search engine URLs which can establish online searching behaviors and the timing of searches. © 2018 American Academy of Forensic Sciences.
The Joy of Playing with Oceanographic Data
NASA Astrophysics Data System (ADS)
Smith, A. T.; Xing, Z.; Armstrong, E. M.; Thompson, C. K.; Huang, T.
2013-12-01
The web is no longer just an after thought. It is no longer just a presentation layer filled with HTML, CSS, JavaScript, Frameworks, 3D, and more. It has become the medium of our communication. It is the database of all databases. It is the computing platform of all platforms. It has transformed the way we do science. Web service is the de facto method for communication between machines over the web. Representational State Transfer (REST) has standardized the way we architect services and their interfaces. In the Earth Science domain, we are familiar with tools and services such as Open-Source Project for Network Data Access Protocol (OPeNDAP), Thematic Realtime Environmental Distributed Data Services (THREDDS), and Live Access Server (LAS). We are also familiar with various data formats such as NetCDF3/4, HDF4/5, GRIB, TIFF, etc. One of the challenges for the Earth Science community is accessing information within these data. There are community-accepted readers that our users can download and install. However, the Application Programming Interface (API) between these readers is not standardized, which leads to non-portable applications. Webification (w10n) is an emerging technology, developed at the Jet Propulsion Laboratory, which exploits the hierarchical nature of a science data artifact to assign a URL to each element within the artifact. (e.g. a granule file). By embracing standards such as JSON, XML, and HTML5 and predictable URL, w10n provides a simple interface that enables tool-builders and researchers to develop portable tools/applications to interact with artifacts of various formats. The NASA Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is the designated data center for observational products relevant to the physical state of the ocean. Over the past year PO.DAAC has been evaluating w10n technology by webifying its archive holdings to provide simplified access to oceanographic science artifacts and as a service to enable future tools and services development. In this talk, we will focus on a w10n-based system called Distributed Oceanographic Webification Service (DOWS) being developed at PO.DAAC to provide a newer and simpler method for working with observational data artifacts. As a continued effort at PO.DAAC to provide better tools and services to visualize our data, the talk will discuss the latest in web-based data visualization tools/frameworks (such as d3.js, Three.js, Leaflet.js, and more) and techniques for working with webified oceanographic science data in both a 2D and 3D web approach.
Toseland, Christopher P; Clayton, Debra J; McSparron, Helen; Hemsley, Shelley L; Blythe, Martin J; Paine, Kelly; Doytchinova, Irini A; Guan, Pingping; Hattotuwagama, Channa K; Flower, Darren R
2005-01-01
AntiJen is a database system focused on the integration of kinetic, thermodynamic, functional, and cellular data within the context of immunology and vaccinology. Compared to its progenitor JenPep, the interface has been completely rewritten and redesigned and now offers a wider variety of search methods, including a nucleotide and a peptide BLAST search. In terms of data archived, AntiJen has a richer and more complete breadth, depth, and scope, and this has seen the database increase to over 31,000 entries. AntiJen provides the most complete and up-to-date dataset of its kind. While AntiJen v2.0 retains a focus on both T cell and B cell epitopes, its greatest novelty is the archiving of continuous quantitative data on a variety of immunological molecular interactions. This includes thermodynamic and kinetic measures of peptide binding to TAP and the Major Histocompatibility Complex (MHC), peptide-MHC complexes binding to T cell receptors, antibodies binding to protein antigens and general immunological protein-protein interactions. The database also contains quantitative specificity data from position-specific peptide libraries and biophysical data, in the form of diffusion co-efficients and cell surface copy numbers, on MHCs and other immunological molecules. The uses of AntiJen include the design of vaccines and diagnostics, such as tetramers, and other laboratory reagents, as well as helping parameterize the bioinformatic or mathematical in silico modeling of the immune system. The database is accessible from the URL: . PMID:16305757
APADB: a database for alternative polyadenylation and microRNA regulation events
Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn
2014-01-01
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
The National Solar Observatory Digital Library - a resource for space weather studies
NASA Astrophysics Data System (ADS)
Hill, F.; Erdwurm, W.; Branston, D.; McGraw, R.
2000-09-01
We describe the National Solar Observatory Digital Library (NSODL), consisting of 200GB of on-line archived solar data, a RDBMS search engine, and an Internet HTML-form user interface. The NSODL is open to all users and provides simple access to solar physics data of basic importance for space weather research and forecasting, heliospheric research, and education. The NSODL can be accessed at the URL www.nso.noao.edu/diglib.
Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry
2016-01-01
Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237
Fish Karyome version 2.1: a chromosome database of fishes and other aquatic organisms
Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Rashid, Iliyas; Sharma, Jyoti; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra; Murali, S.
2016-01-01
A voluminous information is available on karyological studies of fishes; however, limited efforts were made for compilation and curation of the available karyological data in a digital form. ‘Fish Karyome’ database was the preliminary attempt to compile and digitize the available karyological information on finfishes belonging to the Indian subcontinent. But the database had limitations since it covered data only on Indian finfishes with limited search options. Perceiving the feedbacks from the users and its utility in fish cytogenetic studies, the Fish Karyome database was upgraded by applying Linux, Apache, MySQL and PHP (pre hypertext processor) (LAMP) technologies. In the present version, the scope of the system was increased by compiling and curating the available chromosomal information over the globe on fishes and other aquatic organisms, such as echinoderms, molluscs and arthropods, especially of aquaculture importance. Thus, Fish Karyome version 2.1 presently covers 866 chromosomal records for 726 species supported with 253 published articles and the information is being updated regularly. The database provides information on chromosome number and morphology, sex chromosomes, chromosome banding, molecular cytogenetic markers, etc. supported by fish and karyotype images through interactive tools. It also enables the users to browse and view chromosomal information based on habitat, family, conservation status and chromosome number. The system also displays chromosome number in model organisms, protocol for chromosome preparation and allied techniques and glossary of cytogenetic terms. A data submission facility has also been provided through data submission panel. The database can serve as a unique and useful resource for cytogenetic characterization, sex determination, chromosomal mapping, cytotaxonomy, karyo-evolution and systematics of fishes. Database URL: http://mail.nbfgr.res.in/Fish_Karyome PMID:26980518
Nascimento, Leandro Costa; Salazar, Marcela Mendes; Lepikson-Neto, Jorge; Camargo, Eduardo Leal Oliveira; Parreiras, Lucas Salera; Carazzolle, Marcelo Falsarella
2017-01-01
Abstract Tree species of the genus Eucalyptus are the most valuable and widely planted hardwoods in the world. Given the economic importance of Eucalyptus trees, much effort has been made towards the generation of specimens with superior forestry properties that can deliver high-quality feedstocks, customized to the industrýs needs for both cellulosic (paper) and lignocellulosic biomass production. In line with these efforts, large sets of molecular data have been generated by several scientific groups, providing invaluable information that can be applied in the development of improved specimens. In order to fully explore the potential of available datasets, the development of a public database that provides integrated access to genomic and transcriptomic data from Eucalyptus is needed. EUCANEXT is a database that analyses and integrates publicly available Eucalyptus molecular data, such as the E. grandis genome assembly and predicted genes, ESTs from several species and digital gene expression from 26 RNA-Seq libraries. The database has been implemented in a Fedora Linux machine running MySQL and Apache, while Perl CGI was used for the web interfaces. EUCANEXT provides a user-friendly web interface for easy access and analysis of publicly available molecular data from Eucalyptus species. This integrated database allows for complex searches by gene name, keyword or sequence similarity and is publicly accessible at http://www.lge.ibi.unicamp.br/eucalyptusdb. Through EUCANEXT, users can perform complex analysis to identify genes related traits of interest using RNA-Seq libraries and tools for differential expression analysis. Moreover, all the bioinformatics pipeline here described, including the database schema and PERL scripts, are readily available and can be applied to any genomic and transcriptomic project, regardless of the organism. Database URL: http://www.lge.ibi.unicamp.br/eucalyptusdb PMID:29220468
Fish Karyome version 2.1: a chromosome database of fishes and other aquatic organisms.
Nagpure, Naresh Sahebrao; Pathak, Ajey Kumar; Pati, Rameshwar; Rashid, Iliyas; Sharma, Jyoti; Singh, Shri Prakash; Singh, Mahender; Sarkar, Uttam Kumar; Kushwaha, Basdeo; Kumar, Ravindra; Murali, S
2016-01-01
A voluminous information is available on karyological studies of fishes; however, limited efforts were made for compilation and curation of the available karyological data in a digital form. 'Fish Karyome' database was the preliminary attempt to compile and digitize the available karyological information on finfishes belonging to the Indian subcontinent. But the database had limitations since it covered data only on Indian finfishes with limited search options. Perceiving the feedbacks from the users and its utility in fish cytogenetic studies, the Fish Karyome database was upgraded by applying Linux, Apache, MySQL and PHP (pre hypertext processor) (LAMP) technologies. In the present version, the scope of the system was increased by compiling and curating the available chromosomal information over the globe on fishes and other aquatic organisms, such as echinoderms, molluscs and arthropods, especially of aquaculture importance. Thus, Fish Karyome version 2.1 presently covers 866 chromosomal records for 726 species supported with 253 published articles and the information is being updated regularly. The database provides information on chromosome number and morphology, sex chromosomes, chromosome banding, molecular cytogenetic markers, etc. supported by fish and karyotype images through interactive tools. It also enables the users to browse and view chromosomal information based on habitat, family, conservation status and chromosome number. The system also displays chromosome number in model organisms, protocol for chromosome preparation and allied techniques and glossary of cytogenetic terms. A data submission facility has also been provided through data submission panel. The database can serve as a unique and useful resource for cytogenetic characterization, sex determination, chromosomal mapping, cytotaxonomy, karyo-evolution and systematics of fishes. Database URL: http://mail.nbfgr.res.in/Fish_Karyome. © The Author(s) 2016. Published by Oxford University Press.
uPy: a ubiquitous CG Python API with biological-modeling applications.
Autin, Ludovic; Johnson, Graham; Hake, Johan; Olson, Arthur; Sanner, Michel
2012-01-01
The uPy Python extension module provides a uniform abstraction of the APIs of several 3D computer graphics programs (called hosts), including Blender, Maya, Cinema 4D, and DejaVu. A plug-in written with uPy can run in all uPy-supported hosts. Using uPy, researchers have created complex plug-ins for molecular and cellular modeling and visualization. uPy can simplify programming for many types of projects (not solely science applications) intended for multihost distribution. It's available at http://upy.scripps.edu. The first featured Web extra is a video that shows interactive analysis of a calcium dynamics simulation. YouTube URL: http://youtu.be/wvs-nWE6ypo. The second featured Web extra is a video that shows rotation of the HIV virus. YouTube URL: http://youtu.be/vEOybMaRoKc.
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Establishment of Kawasaki disease database based on metadata standard.
Park, Yu Rang; Kim, Jae-Jung; Yoon, Young Jo; Yoon, Young-Kwang; Koo, Ha Yeong; Hong, Young Mi; Jang, Gi Young; Shin, Soo-Yong; Lee, Jong-Keuk
2016-07-01
Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients' clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals. Each sample contains the relevant clinical data, genomic DNA and plasma samples isolated from patients' blood, omics data and KD-associated genotype data. Clinical data was collected and saved using the common data elements based on the ISO/IEC 11179 metadata standard. Two genome-wide association study data of total 482 samples and whole exome sequencing data of 12 samples were also collected. In addition, KDD includes the rare cases of KD (16 cases with family history, 46 cases with recurrence, 119 cases with intravenous immunoglobulin non-responsiveness, and 52 cases with coronary artery aneurysm). As the first public database for KD, KDD can significantly facilitate KD studies. All data in KDD can be searchable and downloadable. KDD was implemented in PHP, MySQL and Apache, with all major browsers supported.Database URL: http://www.kawasakidisease.kr. © The Author(s) 2016. Published by Oxford University Press.
NASA Technical Reports Server (NTRS)
vonOfenheim. William H. C.; Heimerl, N. Lynn; Binkley, Robert L.; Curry, Marty A.; Slater, Richard T.; Nolan, Gerald J.; Griswold, T. Britt; Kovach, Robert D.; Corbin, Barney H.; Hewitt, Raymond W.
1998-01-01
This paper discusses the technical aspects of and the project background for the NASA Image exchange (NIX). NIX, which provides a single entry point to search selected image databases at the NASA Centers, is a meta-search engine (i.e., a search engine that communicates with other search engines). It uses these distributed digital image databases to access photographs, animations, and their associated descriptive information (meta-data). NIX is available for use at the following URL: http://nix.nasa.gov./NIX, which was sponsored by NASAs Scientific and Technical Information (STI) Program, currently serves images from seven NASA Centers. Plans are under way to link image databases from three additional NASA Centers. images and their associated meta-data, which are accessible by NIX, reside at the originating Centers, and NIX utilizes a virtual central site that communicates with each of these sites. Incorporated into the virtual central site are several protocols to support searches from a diverse collection of database engines. The searches are performed in parallel to ensure optimization of response times. To augment the search capability, browse functionality with pre-defined categories has been built into NIX, thereby ensuring dissemination of 'best-of-breed' imagery. As a final recourse, NIX offers access to a help desk via an on-line form to help locate images and information either within the scope of NIX or from available external sources.
GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR.
Gubelmann, Carine; Gattiker, Alexandre; Massouras, Andreas; Hens, Korneel; David, Fabrice; Decouttere, Frederik; Rougemont, Jacques; Deplancke, Bart
2011-01-01
The vast majority of genes in humans and other organisms undergo alternative splicing, yet the biological function of splice variants is still very poorly understood in large part because of the lack of simple tools that can map the expression profiles and patterns of these variants with high sensitivity. High-throughput quantitative real-time polymerase chain reaction (qPCR) is an ideal technique to accurately quantify nucleic acid sequences including splice variants. However, currently available primer design programs do not distinguish between splice variants and also differ substantially in overall quality, functionality or throughput mode. Here, we present GETPrime, a primer database supported by a novel platform that uniquely combines and automates several features critical for optimal qPCR primer design. These include the consideration of all gene splice variants to enable either gene-specific (covering the majority of splice variants) or transcript-specific (covering one splice variant) expression profiling, primer specificity validation, automated best primer pair selection according to strict criteria and graphical visualization of the latter primer pairs within their genomic context. GETPrime primers have been extensively validated experimentally, demonstrating high transcript specificity in complex samples. Thus, the free-access, user-friendly GETPrime database allows fast primer retrieval and visualization for genes or groups of genes of most common model organisms, and is available at http://updepla1srv1.epfl.ch/getprime/. Database URL: http://deplanckelab.epfl.ch.
Chan, Wen-Ling; Yang, Wen-Kuang; Huang, Hsien-Da; Chang, Jan-Gowth
2013-01-01
RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/. Database URL: http://pseudomap.mbc.nctu.edu.tw/
GETPrime: a gene- or transcript-specific primer database for quantitative real-time PCR
Gubelmann, Carine; Gattiker, Alexandre; Massouras, Andreas; Hens, Korneel; David, Fabrice; Decouttere, Frederik; Rougemont, Jacques; Deplancke, Bart
2011-01-01
The vast majority of genes in humans and other organisms undergo alternative splicing, yet the biological function of splice variants is still very poorly understood in large part because of the lack of simple tools that can map the expression profiles and patterns of these variants with high sensitivity. High-throughput quantitative real-time polymerase chain reaction (qPCR) is an ideal technique to accurately quantify nucleic acid sequences including splice variants. However, currently available primer design programs do not distinguish between splice variants and also differ substantially in overall quality, functionality or throughput mode. Here, we present GETPrime, a primer database supported by a novel platform that uniquely combines and automates several features critical for optimal qPCR primer design. These include the consideration of all gene splice variants to enable either gene-specific (covering the majority of splice variants) or transcript-specific (covering one splice variant) expression profiling, primer specificity validation, automated best primer pair selection according to strict criteria and graphical visualization of the latter primer pairs within their genomic context. GETPrime primers have been extensively validated experimentally, demonstrating high transcript specificity in complex samples. Thus, the free-access, user-friendly GETPrime database allows fast primer retrieval and visualization for genes or groups of genes of most common model organisms, and is available at http://updepla1srv1.epfl.ch/getprime/. Database URL: http://deplanckelab.epfl.ch. PMID:21917859
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234
Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf
2014-01-01
CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.
Tozzoli, Renato; D'Aurizio, Federica; Ferrari, Anna; Castello, Roberto; Metus, Paolo; Caruso, Beatrice; Perosa, Anna Rosa; Sirianni, Francesca; Stenner, Elisabetta; Steffan, Agostino; Villalta, Danilo
2016-01-15
The determination of the upper reference limit (URL) for thyroid peroxidase autoantibodies (TPOAbs) is a contentious issue, because of the difficulty in defining the reference population. The aim of this study was to establish the URL (eURL) for TPOAbs, according to the National Academy of Clinical Biochemistry (NACB) guidelines and to compare them with those obtained in a female counterpart, by the use of six commercial automated platforms. 120 healthy males and 120 healthy females with NACB-required characteristics (<30years, TSH between 0.5 and 2.0mIU/L, normal thyroid ultrasound, without personal/family history of thyroid and non-thyroid autoimmune diseases) were studied. Sera were analyzed for TPOAbs concentration using six immunoassay methods applied in automated analyzers: Advia Centaur XP (CEN), Siemens Healthcare Diagnostics; Maglumi 2000 Plus, Shenzen New Industries Biomedical Engineering; Architect ci4100, Abbott; Cobas e411 (COB) Roche Diagnostics; Unicel DxI (UNI) and Lumipulse G1200, Fujirebio. Within each method, TPOAbs values had a high degree of dispersion and the eURLs were lower than those stated by the manufacturer. A statistically significant difference (p<0.05) between medians of males and females was observed only for COB and for UNI. However, the comparison of the male and female proportions positive for TPOAbs using the eURL of the counterpart, showed the lack of clinical significance of the above differences (Chi-square test, p>0.05). Despite the analytical harmonization, the wide dispersion of the results and the differences of the eURLs between methods suggest the need of further studies focusing on TPO antigen preparations as the possible source of variability between different assays. In addition, the lack of clinical significant difference between males and females, in terms of TPOAb eURLs, confirms the suitability of the NACB recommendations. Copyright © 2015 Elsevier B.V. All rights reserved.
Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A
2018-01-01
Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species. Database URL https://solgenomics.net/tools/solcyc/ PMID:29762652
CCProf: exploring conformational change profile of proteins
Chang, Che-Wei; Chou, Chai-Wei; Chang, Darby Tien-Hao
2016-01-01
In many biological processes, proteins have important interactions with various molecules such as proteins, ions or ligands. Many proteins undergo conformational changes upon these interactions, where regions with large conformational changes are critical to the interactions. This work presents the CCProf platform, which provides conformational changes of entire proteins, named conformational change profile (CCP) in the context. CCProf aims to be a platform where users can study potential causes of novel conformational changes. It provides 10 biological features, including conformational change, potential binding target site, secondary structure, conservation, disorder propensity, hydropathy propensity, sequence domain, structural domain, phosphorylation site and catalytic site. All these information are integrated into a well-aligned view, so that researchers can capture important relevance between different biological features visually. The CCProf contains 986 187 protein structure pairs for 3123 proteins. In addition, CCProf provides a 3D view in which users can see the protein structures before and after conformational changes as well as binding targets that induce conformational changes. All information (e.g. CCP, binding targets and protein structures) shown in CCProf, including intermediate data are available for download to expedite further analyses. Database URL: http://zoro.ee.ncku.edu.tw/ccprof/ PMID:27016699
NASA Technical Reports Server (NTRS)
Foster, Cyrus; Jaroux, Belgacem A.
2012-01-01
The Trajectory Browser is a web-based tool developed at the NASA Ames Research Center to be used for the preliminary assessment of trajectories to small-bodies and planets and for providing relevant launch date, time-of-flight and V requirements. The site hosts a database of transfer trajectories from Earth to asteroids and planets for various types of missions such as rendezvous, sample return or flybys. A search engine allows the user to find trajectories meeting desired constraints on the launch window, mission duration and delta V capability, while a trajectory viewer tool allows the visualization of the heliocentric trajectory and the detailed mission itinerary. The anticipated user base of this tool consists primarily of scientists and engineers designing interplanetary missions in the context of pre-phase A studies, particularly for performing accessibility surveys to large populations of small-bodies. The educational potential of the website is also recognized for academia and the public with regards to trajectory design, a field that has generally been poorly understood by the public. The website is currently hosted on NASA-internal URL http://trajbrowser.arc.nasa.gov/ with plans for a public release as soon as development is complete.
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data
Wilks, Christopher; Cline, Melissa S.; Weiler, Erich; Diehkans, Mark; Craft, Brian; Martin, Christy; Murphy, Daniel; Pierce, Howdy; Black, John; Nelson, Donavan; Litzinger, Brian; Hatton, Thomas; Maltbie, Lori; Ainsworth, Michael; Allen, Patrick; Rosewood, Linda; Mitchell, Elizabeth; Smith, Bradley; Warner, Jim; Groboske, John; Telc, Haifang; Wilson, Daniel; Sanford, Brian; Schmidt, Hannes; Haussler, David; Maltbie, Daniel
2014-01-01
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu PMID:25267794
Westbrook, John D; Feng, Zukang; Persikova, Irina; Sala, Raul; Sen, Sanchayita; Berrisford, John M; Swaminathan, G Jawahar; Oldfield, Thomas J; Gutmanas, Aleksandras; Igarashi, Reiko; Armstrong, David R; Baskaran, Kumaran; Chen, Li; Chen, Minyu; Clark, Alice R; Di Costanzo, Luigi; Dimitropoulos, Dimitris; Gao, Guanghua; Ghosh, Sutapa; Gore, Swanand; Guranovic, Vladimir; Hendrickx, Pieter M S; Hudson, Brian P; Ikegawa, Yasuyo; Kengaku, Yumiko; Lawson, Catherine L; Liang, Yuhe; Mak, Lora; Mukhopadhyay, Abhik; Narayanan, Buvaneswari; Nishiyama, Kayoko; Patwardhan, Ardan; Sahni, Gaurav; Sanz-García, Eduardo; Sato, Junko; Sekharan, Monica R; Shao, Chenghua; Smart, Oliver S; Tan, Lihua; van Ginkel, Glen; Yang, Huanwang; Zhuravleva, Marina A; Markley, John L; Nakamura, Haruki; Kurisu, Genji; Kleywegt, Gerard J; Velankar, Sameer; Berman, Helen M; Burley, Stephen K
2018-01-01
Abstract The Protein Data Bank (PDB) is the single global repository for experimentally determined 3D structures of biological macromolecules and their complexes with ligands. The worldwide PDB (wwPDB) is the international collaboration that manages the PDB archive according to the FAIR principles: Findability, Accessibility, Interoperability and Reusability. The wwPDB recently developed OneDep, a unified tool for deposition, validation and biocuration of structures of biological macromolecules. All data deposited to the PDB undergo critical review by wwPDB Biocurators. This article outlines the importance of biocuration for structural biology data deposited to the PDB and describes wwPDB biocuration processes and the role of expert Biocurators in sustaining a high-quality archive. Structural data submitted to the PDB are examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources and validated for scientific/technical accuracy. We illustrate how biocuration is integral to PDB data archiving, as it facilitates accurate, consistent and comprehensive representation of biological structure data, allowing efficient and effective usage by research scientists, educators, students and the curious public worldwide. Database URL: https://www.wwpdb.org/ PMID:29688351
A guide to best practices for Gene Ontology (GO) manual annotation
Balakrishnan, Rama; Harris, Midori A.; Huntley, Rachael; Van Auken, Kimberly; Cherry, J. Michael
2013-01-01
The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374 000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. Database URL: http://www.geneontology.org PMID:23842463
Understanding PubMed user search behavior through log analysis.
Islamaj Dogan, Rezarta; Murray, G Craig; Névéol, Aurélie; Lu, Zhiyong
2009-01-01
This article reports on a detailed investigation of PubMed users' needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users' needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users' interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users' decisions. Analysis of characteristics such as these plays a critical role in identifying users' information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval.Database URL:http://www.ncbi.nlm.nih.gov/PubMed.
Linking to NSCEP's Online Publications
Each online document has a permanent URL that can be linked to for future reference. To find the short URL for a document you need to be in the document and able to view the icon bar above the document.
29 CFR 1614.703 - Manner and format of data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... vertical columns. The oldest fiscal year data shall be listed first, reading left to right, with the other... Resource Locator (URL) for the data it posts under this subpart. Thereafter, new or changed URLs shall be...
2012-09-07
Image L61-8036 is available as an electronic file from the photo lab. See URL. -- Photographed on 12/05/1961. -- Multiple exposure of an impact test of the Apollo command module. In this test the Apollo capsule was tested making a sand landing. -- Published in James R. Hansen, Spaceflight Revolution: NASA Langley Research Center From Sputnik to Apollo, (Washington: NASA, 1995), pp. 361-366.
MET network in PubMed: a text-mined network visualization and curation system.
Dai, Hong-Jie; Su, Chu-Hsien; Lai, Po-Ting; Huang, Ming-Siang; Jonnagaddala, Jitendra; Rose Jue, Toni; Rao, Shruti; Chou, Hui-Jou; Milacic, Marija; Singh, Onkar; Syed-Abdul, Shabbir; Hsu, Wen-Lian
2016-01-01
Metastasis is the dissemination of a cancer/tumor from one organ to another, and it is the most dangerous stage during cancer progression, causing more than 90% of cancer deaths. Improving the understanding of the complicated cellular mechanisms underlying metastasis requires investigations of the signaling pathways. To this end, we developed a METastasis (MET) network visualization and curation tool to assist metastasis researchers retrieve network information of interest while browsing through the large volume of studies in PubMed. MET can recognize relations among genes, cancers, tissues and organs of metastasis mentioned in the literature through text-mining techniques, and then produce a visualization of all mined relations in a metastasis network. To facilitate the curation process, MET is developed as a browser extension that allows curators to review and edit concepts and relations related to metastasis directly in PubMed. PubMed users can also view the metastatic networks integrated from the large collection of research papers directly through MET. For the BioCreative 2015 interactive track (IAT), a curation task was proposed to curate metastatic networks among PubMed abstracts. Six curators participated in the proposed task and a post-IAT task, curating 963 unique metastatic relations from 174 PubMed abstracts using MET.Database URL: http://btm.tmu.edu.tw/metastasisway. © The Author(s) 2016. Published by Oxford University Press.
In-Cardiome: integrated knowledgebase for coronary artery disease enabling translational research
Sharma, Ankit; Deshpande, Vrushali; Ghatge, Madankumar
2017-01-01
Abstract Coronary artery disease (CAD) is a leading cause of death worldwide. Prevention, diagnosis and clinical interventions are dependent on the conventional risk factors like hypertension, diabetes and obesity. However, these conventional risk factors do not completely identify high risk individuals. One major hurdle in the improvement of diagnosis and treatment for CAD is the lack of integration of knowledge from different areas of research like molecular, clinical and drug development. In order to provide comprehensive information from hitherto dispersed data, we developed an integrative knowledgebase called “In-Cardiome or Integrated Cardiome” for all the stake holders in healthcare such as scientists, clinicians and pharmaceutical companies. It is created by integrating 16 different data sources, 995 curated genes classified into 12 different functional categories associated with disease, 1204 completed clinical trials, 12 therapy or drug classifications with 62 approved drugs and drug target networks. This knowledgebase gives the most needed opportunity to understand the disease process and therapeutic impact along with gene expression data from both animal models and patients. The data is classified into three different search categories functional groups, risk factors and therapy/drug based classes. One more unique aspect of In-Cardiome is integration of clinical data of 10,217 subject data from our ongoing Indian Atherosclerosis Research Study (IARS) (6357 unaffected and 3860 CAD affected). IARS data showing demographics and associations of individual and combinations of risk factors in Indian population along with molecular information will enable better translational and drug development research. Database URL www.tri-incardiome.org PMID:29220465
Paillet, Frederick L.
1988-01-01
Various conventional geophysical well logs were obtained in conjunction with acoustic tube-wave amplitude and experimental heat-pulse flowmeter measurements in two deep boreholes in granitic rocks on the Canadian shield in southeastern Manitoba. The objective of this study is the development of measurement techniques and data processing methods for characterization of rock volumes that might be suitable for hosting a nuclear waste repository. One borehole, WRA1, intersected several major fracture zones, and was suitable for testing quantitative permeability estimation methods. The other borehole, URL13, appeared to intersect almost no permeable fractures; it was suitable for testing methods for the characterization of rocks of very small permeability and uniform thermo-mechanical properties in a potential repository horizon. Epithermal neutron , acoustic transit time, and single-point resistance logs provided useful, qualitative indications of fractures in the extensively fractured borehole, WRA1. A single-point log indicates both weathering and the degree of opening of a fracture-borehole intersection. All logs indicate the large intervals of mechanically and geochemically uniform, unfractured granite below depths of 300 m in the relatively unfractured borehole, URL13. Some indications of minor fracturing were identified in that borehole, with one possible fracture at a depth of about 914 m, producing a major acoustic waveform anomaly. Comparison of acoustic tube-wave attenuation with models of tube-wave attenuation in infinite fractures of given aperture provide permeability estimates ranging from equivalent single-fractured apertures of less than 0.01 mm to apertures of > 0.5 mm. One possible fracture anomaly in borehole URL13 at a depth of about 914 m corresponds with a thin mafic dike on the core where unusually large acoustic contrast may have produced the observed waveform anomaly. No indications of naturally occurring flow existed in borehole URL13; however, flowmeter measurements indicated flow at < 0.05 L/min from the upper fracture zones in borehole WRA1 to deeper fractures at depths below 800 m. (Author 's abstract)
78 FR 33807 - Privacy Act New System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2013-06-05
.... For National Institute of Standards and Technology, Chief Information Officer, 100 Bureau Drive..., address, email, and telephone number; credit card information; Web site URL; organization category and...; title; address; email address; telephone number; Web site URL; organization category and description...
NASA Technical Reports Server (NTRS)
Collazo, Carlimar
2011-01-01
The statement of purpose is to analyze network monitoring logs to support the computer incident response team. Specifically, gain a clear understanding of the Uniform Resource Locator (URL) and its structure, and provide a way to breakdown a URL based on protocol, host name domain name, path, and other attributes. Finally, provide a method to perform data reduction by identifying the different types of advertisements shown on a webpage for incident data analysis. The procedures used for analysis and data reduction will be a computer program which would analyze the URL and identify and advertisement links from the actual content links.
New Searching Capability and OpenURL Linking in the ADS
NASA Astrophysics Data System (ADS)
Eichhorn, Guenther; Accomazzi, A.; Grant, C. S.; Henneken, E.; Kurtz, M. J.; Thompson, D. M.; Murray, S. S.
2006-12-01
The ADS is the search system of choice for the astronomical community. It also covers a large part of the physics and physics/astronomy education literature. In order to make access to this system as easy as possible, we developed a Google-like interface version of our search form. This one-field search parses the user input and automatically detects author names and year ranges. Firefox users can set up their browser to have this search field installed in the top right corner search field to have even easier access to the ADS search capability. The basic search is available from the ADS Homepage at: http://adsabs.harvard.edu To aid with access to subscription journals the ADS now supports OpenURL linking. If your library supports an OpenURL server, you can specify this server in the ADS preference settings. All links to journal articles will then automatically be directed to the OpenURL with the appropriate link information. We provide a selection of known OpenURL servers to choose from. If your server is not in this list, please send the necessary information to ads@cfa.harvard.edu and we will include it in our list. The ADS is funded by NASA grant NNG06GG68G.
Publications of the Western Earth Surface Processes Team 2000
Powell, Charles L.; Stone, Paul
2001-01-01
The Western Earth Surface Processes Team (WESP) of the U.S. Geological Survey (USGS) conducts geologic mapping and related topical earth science studies in the western United States. This work is focused on areas where modern geologic maps and associated earth-science data are needed to address key societal and environmental issues such as ground-water quality, potential geologic hazards, and land-use decisions. Areas of primary emphasis in 2000 included southern California, the San Francisco Bay region, the Pacific Northwest, the Las Vegas urban corridor, and selected National Park lands. The team has its headquarters in Menlo Park, California, and maintains smaller field offices at several other locations in the western United States. The results of research conducted by the WESPT are released to the public as a variety of databases, maps, text reports, and abstracts, both through the internal publication system of the USGS and in diverse external publications such as scientific journals and books. This report lists publications of the WESPT released in 2000 as well as additional 1999 publications that were not included in the previous list (USGS Open-file Report 00-215). Most of the publications listed were authored or coauthored by WESPT staff. The list also includes some publications authored by non-USGS cooperators with the WESPT, as well as some authored by USGS staff outside the WESPT in cooperation with WESPT projects. Several of the publications listed are available on the World Wide Web; for these, URL addresses are provided. Many of these Web publications are USGS open-file reports that contain large digital databases of geologic map and related information.
DPS Planetary Science Graduate Programs Database for Students and Advisors
NASA Astrophysics Data System (ADS)
Klassen, David R.; Roman, Anthony; Meinke, Bonnie K.
2017-10-01
Planetary science is a topic that covers an extremely diverse set of disciplines; planetary scientists are typically housed in a departments spanning a wide range of disciplines. As such it is difficult for undergraduate students to find programs that will give them a degree and research experience in our field as Department of Planetary Science is a rare sighting, indeed. Not only can this overwhelm even the most determined student, it can even be difficult for many undergraduate advisers.Because of this, the DPS Education committee decided several years ago that it should have an online resource that could help undergraduate students find graduate programs that could lead to a PhD with a focus in planetary science. It began in 2013 as a static page of information and evolved from there to a database-driven web site. Visitors can browse the entire list of programs or create a subset listing based on several filters. The site should be of use not only to undergraduates looking for programs, but also for advisers looking to help their students decide on their future plans. We present here a walk-through of the basic features as well as some usage statistics from the collected web site analytics. We ask for community feedback on additional features to make the system more usable for them. We also call upon those mentoring and advising undergraduates to use this resource, and for program admission chairs to continue to review their entry and provide us with the most up-to-date information.The URL for our site is http://dps.aas.org/education/graduate-schools.
Chen, Qingyu; Zobel, Justin; Verspoor, Karin
2017-01-01
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC-a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases - in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases.Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w. © The Author(s) 2017. Published by Oxford University Press.
Chen, Qingyu; Zobel, Justin; Verspoor, Karin
2017-01-01
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC—a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases – in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases. Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w PMID:28077566
SorghumFDB: sorghum functional genomics database with multidimensional network analysis.
Tian, Tian; You, Qi; Zhang, Liwei; Yi, Xin; Yan, Hengyu; Xu, Wenying; Su, Zhen
2016-01-01
Sorghum (Sorghum bicolor [L.] Moench) has excellent agronomic traits and biological properties, such as heat and drought-tolerance. It is a C4 grass and potential bioenergy-producing plant, which makes it an important crop worldwide. With the sorghum genome sequence released, it is essential to establish a sorghum functional genomics data mining platform. We collected genomic data and some functional annotations to construct a sorghum functional genomics database (SorghumFDB). SorghumFDB integrated knowledge of sorghum gene family classifications (transcription regulators/factors, carbohydrate-active enzymes, protein kinases, ubiquitins, cytochrome P450, monolignol biosynthesis related enzymes, R-genes and organelle-genes), detailed gene annotations, miRNA and target gene information, orthologous pairs in the model plants Arabidopsis, rice and maize, gene loci conversions and a genome browser. We further constructed a dynamic network of multidimensional biological relationships, comprised of the co-expression data, protein-protein interactions and miRNA-target pairs. We took effective measures to combine the network, gene set enrichment and motif analyses to determine the key regulators that participate in related metabolic pathways, such as the lignin pathway, which is a major biological process in bioenergy-producing plants.Database URL: http://structuralbiology.cau.edu.cn/sorghum/index.html. © The Author(s) 2016. Published by Oxford University Press.
G4RNA: an RNA G-quadruplex database
Garant, Jean-Michel; Luce, Mikael J.; Scott, Michelle S.
2015-01-01
Abstract G-quadruplexes (G4) are tetrahelical structures formed from planar arrangement of guanines in nucleic acids. A simple, regular motif was originally proposed to describe G4-forming sequences. More recently, however, formation of G4 was discovered to depend, at least in part, on the contextual backdrop of neighboring sequences. Prediction of G4 folding is thus becoming more challenging as G4 outlier structures, not described by the originally proposed motif, are increasingly reported. Recent observations thus call for a comprehensive tool, capable of consolidating the expanding information on tested G4s, in order to conduct systematic comparative analyses of G4-promoting sequences. The G4RNA Database we propose was designed to help meet the need for easily-retrievable data on known RNA G4s. A user-friendly, flexible query system allows for data retrieval on experimentally tested sequences, from many separate genes, to assess G4-folding potential. Query output sorts data according to sequence position, G4 likelihood, experimental outcomes and associated bibliographical references. G4RNA also provides an ideal foundation to collect and store additional sequence and experimental data, considering the growing interest G4s currently generate. Database URL: scottgroup.med.usherbrooke.ca/G4RNA PMID:26200754
The Global Genome Biodiversity Network (GGBN) Data Standard specification.
Droege, G; Barker, K; Seberg, O; Coddington, J; Benson, E; Berendsohn, W G; Bunk, B; Butler, C; Cawsey, E M; Deck, J; Döring, M; Flemons, P; Gemeinholzer, B; Güntsch, A; Hollowell, T; Kelbert, P; Kostadinov, I; Kottmann, R; Lawlor, R T; Lyal, C; Mackenzie-Dodds, J; Meyer, C; Mulcahy, D; Nussbeck, S Y; O'Tuama, É; Orrell, T; Petersen, G; Robertson, T; Söhngen, C; Whitacre, J; Wieczorek, J; Yilmaz, P; Zetzsche, H; Zhang, Y; Zhou, X
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard. © The Author(s) 2016. Published by Oxford University Press.
Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K.; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C.; Hoeng, Julia
2015-01-01
With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com PMID:25887162
Coll, Francesc; Phelan, Jody; Hill-Cawthorne, Grant A; Nair, Mridul B; Mallard, Kim; Ali, Shahjahan; Abdallah, Abdallah M; Alghamdi, Saad; Alsomali, Mona; Ahmed, Abdallah O; Portelli, Stephanie; Oppong, Yaa; Alves, Adriana; Bessa, Theolis Barbosa; Campino, Susana; Caws, Maxine; Chatterjee, Anirvan; Crampin, Amelia C; Dheda, Keertan; Furnham, Nicholas; Glynn, Judith R; Grandjean, Louis; Ha, Dang Minh; Hasan, Rumina; Hasan, Zahra; Hibberd, Martin L; Joloba, Moses; Jones-López, Edward C; Matsumoto, Tomoshige; Miranda, Anabela; Moore, David J; Mocillo, Nora; Panaiotov, Stefan; Parkhill, Julian; Penha, Carlos; Perdigão, João; Portugal, Isabel; Rchiad, Zineb; Robledo, Jaime; Sheen, Patricia; Shesha, Nashwa Talaat; Sirgel, Frik A; Sola, Christophe; Sousa, Erivelton Oliveira; Streicher, Elizabeth M; Van Helden, Paul; Viveiros, Miguel; Warren, Robert M; McNerney, Ruth; Pain, Arnab; Clark, Taane G
2018-05-01
In the version of this article initially published, the URL listed for TubercuList was incorrect. The correct URL is https://mycobrowser.epfl.ch/. The error has been corrected in the HTML and PDF versions of the article.
ERIC Educational Resources Information Center
Scharf, Davida
2002-01-01
Discussion of improving accessibility to copyrighted electronic content focuses on the Digital Object Identifier (DOI) and the Open URL standard and linking software. Highlights include work of the World Wide Web consortium; URI (Uniform Resource Identifier); URL (Uniform Resource Locator); URN (Uniform Resource Name); OCLC's (Online Computer…
Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L
2017-01-01
Abstract The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353 PMID:29220466
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.
Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit
2016-01-01
Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. © The Author(s) 2016. Published by Oxford University Press.
JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms
Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim
2015-01-01
The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080
JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.
Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim
2015-01-01
The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. © The Author(s) 2015. Published by Oxford University Press.
Groups: knowledge spreadsheets for symbolic biocomputing.
Travers, Michael; Paley, Suzanne M; Shrager, Jeff; Holland, Timothy A; Karp, Peter D
2013-01-01
Knowledge spreadsheets (KSs) are a visual tool for interactive data analysis and exploration. They differ from traditional spreadsheets in that rather than being oriented toward numeric data, they work with symbolic knowledge representation structures and provide operations that take into account the semantics of the application domain. 'Groups' is an implementation of KSs within the Pathway Tools system. Groups allows Pathway Tools users to define a group of objects (e.g. groups of genes or metabolites) from a Pathway/Genome Database. Groups can be transformed (e.g. by transforming a metabolite group to the group of pathways in which those metabolites are substrates); combined through set operations; analysed (e.g. through enrichment analysis); and visualized (e.g. by painting onto a metabolic map diagram). Users of the Pathway Tools-based BioCyc.org website have made extensive use of Groups, and an informal survey of Groups users suggests that Groups has achieved the goal of allowing biologists themselves to perform some data manipulations that previously would have required the assistance of a programmer. Database URL: BioCyc.org.
Disease model curation improvements at Mouse Genome Informatics
Bello, Susan M.; Richardson, Joel E.; Davis, Allan P.; Wiegers, Thomas C.; Mattingly, Carolyn J.; Dolan, Mary E.; Smith, Cynthia L.; Blake, Judith A.; Eppig, Janan T.
2012-01-01
Optimal curation of human diseases requires an ontology or structured vocabulary that contains terms familiar to end users, is robust enough to support multiple levels of annotation granularity, is limited to disease terms and is stable enough to avoid extensive reannotation following updates. At Mouse Genome Informatics (MGI), we currently use disease terms from Online Mendelian Inheritance in Man (OMIM) to curate mouse models of human disease. While OMIM provides highly detailed disease records that are familiar to many in the medical community, it lacks structure to support multilevel annotation. To improve disease annotation at MGI, we evaluated the merged Medical Subject Headings (MeSH) and OMIM disease vocabulary created by the Comparative Toxicogenomics Database (CTD) project. Overlaying MeSH onto OMIM provides hierarchical access to broad disease terms, a feature missing from the OMIM. We created an extended version of the vocabulary to meet the genetic disease-specific curation needs at MGI. Here we describe our evaluation of the CTD application, the extensions made by MGI and discuss the strengths and weaknesses of this approach. Database URL: http://www.informatics.jax.org/ PMID:22434831
CELLPEDIA: a repository for human cell information for cell studies and differentiation analyses.
Hatano, Akiko; Chiba, Hirokazu; Moesa, Harry Amri; Taniguchi, Takeaki; Nagaie, Satoshi; Yamanegi, Koji; Takai-Igarashi, Takako; Tanaka, Hiroshi; Fujibuchi, Wataru
2011-01-01
CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/
TransAtlasDB: an integrated database connecting expression data, metadata and variants
Adetunji, Modupeore O; Lamont, Susan J; Schmidt, Carl J
2018-01-01
Abstract High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis. Database URL: https://modupeore.github.io/TransAtlasDB/ PMID:29688361
Functional Requirements for Information Resource Provenance on the Web
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCusker, James P.; Lebo, Timothy; Graves, Alvaro
We provide a means to formally explain the relationship between HTTP URLs and the representations returned when they are requested. According to existing World Wide Web architecture, the URL serves as an identier for a semiotic referent while the document returned via HTTP serves as a representation of the same referent. This begins with two sides of a semiotic triangle; the third side is the relationship between the URL and the representation received. We complete this description by extending the library science resource model Functional Requirements for Bibliographic Resources (FRBR) with cryptographic message and content digests to create a Functionalmore » Requirements for Information Resources (FRIR). We show how applying the FRIR model to HTTP GET and POST transactions disambiguates the many relationships between a given URL and all representations received from its request, provides fine-grained explanations that are complementary to existing explanations of web resources, and integrates easily into the emerging W3C provenance standard.« less
A general concept for consistent documentation of computational analyses
Müller, Fabian; Nordström, Karl; Lengauer, Thomas; Schulz, Marcel H.
2015-01-01
The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip PMID:26055099
2017-03-01
Mundell, 2016, p. 49). 22 For the sake of this thesis, it is important to note the specifics of Mundell’s (2016) research , as to provide a comparative ...his research did not first break out designation changes by specific community or year, but rather grouped all transfers at the four-, six-, and... compared to Naval Academy graduates, regardless of URL or RL/Staff community designation . 55 Table 13. MSR Retention Model Results by
Wang, Yang; Li, Yue; Yue, Minghui; Wang, Jun; Kumar, Sandeep; Wechsler-Reya, Robert J; Zhang, Zhaolei; Ogawa, Yuya; Kellis, Manolis; Duester, Gregg; Zhao, Jing Crystal
2018-06-07
In the version of this article initially published online, there were errors in URLs for www.southernbiotech.com, appearing in Methods sections "m6A dot-blot" and "Western blot analysis." The first two URLs should be https://www.southernbiotech.com/?catno=4030-05&type=Polyclonal#&panel1-1 and the third should be https://www.southernbiotech.com/?catno=6170-05&type=Polyclonal. In addition, some Methods URLs for bioz.com, www.abcam.com and www.sysy.com were printed correctly but not properly linked. The errors have been corrected in the PDF and HTML versions of this article.
Gene Unprediction with Spurio: A tool to identify spurious protein sequences.
Höps, Wolfram; Jeffryes, Matt; Bateman, Alex
2018-01-01
We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.
ToxReporter: viewing the genome through the eyes of a toxicologist.
Gosink, Mark
2016-01-01
One of the many roles of a toxicologist is to determine if an observed adverse event (AE) is related to a previously unrecognized function of a given gene/protein. Towards that end, he or she will search a variety of public and propriety databases for information linking that protein to the observed AE. However, these databases tend to present all available information about a protein, which can be overwhelming, limiting the ability to find information about the specific toxicity being investigated. ToxReporter compiles information from a broad selection of resources and limits display of the information to user-selected areas of interest. ToxReporter is a PERL-based web-application which utilizes a MySQL database to streamline this process by categorizing public and proprietary domain-derived information into predefined safety categories according to a customizable lexicon. Users can view gene information that is 'red-flagged' according to the safety issue under investigation. ToxReporter also uses a scoring system based on relative counts of the red-flags to rank all genes for the amount of information pertaining to each safety issue and to display their scored ranking as an easily interpretable 'Tox-At-A-Glance' chart. Although ToxReporter was originally developed to display safety information, its flexible design could easily be adapted to display disease information as well.Database URL: ToxReporter is freely available at https://github.com/mgosink/ToxReporter. © The Author(s) 2016. Published by Oxford University Press.
Workflow and web application for annotating NCBI BioProject transcriptome data
Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David
2017-01-01
Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765
Knowledge-rich temporal relation identification and classification in clinical notes
D’Souza, Jennifer; Ng, Vincent
2014-01-01
Motivation: We examine the task of temporal relation classification for the clinical domain. Our approach to this task departs from existing ones in that it is (i) ‘knowledge-rich’, employing sophisticated knowledge derived from discourse relations as well as both domain-independent and domain-dependent semantic relations, and (ii) ‘hybrid’, combining the strengths of rule-based and learning-based approaches. Evaluation results on the i2b2 Clinical Temporal Relations Challenge corpus show that our approach yields a 17–24% and 8–14% relative reduction in error over a state-of-the-art learning-based baseline system when gold-standard and automatically identified temporal relations are used, respectively. Database URL: http://www.hlt.utdallas.edu/~jld082000/temporal-relations/ PMID:25414383
Principles of metadata organization at the ENCODE data coordination center
Hong, Eurie L.; Sloan, Cricket A.; Chan, Esther T.; Davidson, Jean M.; Malladi, Venkat S.; Strattan, J. Seth; Hitz, Benjamin C.; Gabdank, Idan; Narayanan, Aditi K.; Ho, Marcus; Lee, Brian T.; Rowe, Laurence D.; Dreszer, Timothy R.; Roe, Greg R.; Podduturi, Nikhil R.; Tanaka, Forrest; Hilton, Jason A.; Cherry, J. Michael
2016-01-01
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org PMID:26980513
PIPE: a protein–protein interaction passage extraction module for BioCreative challenge
Chu, Chun-Han; Su, Yu-Chen; Chen, Chien Chin; Hsu, Wen-Lian
2016-01-01
Identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this article, we propose PIPE, an interaction pattern generation module used in the Collaborative Biocurator Assistant Task at BioCreative V (http://www.biocreative.org/) to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree (IPT) kernel method that integrates the PPI patterns with convolution tree kernel (CTK) to extract PPIs. Methods were evaluated on LLL, IEPA, HPRD50, AIMed and BioInfer corpora using cross-validation, cross-learning and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Database URL: PMID:27524807
modlAMP: Python for antimicrobial peptides.
Müller, Alex T; Gabernet, Gisela; Hiss, Jan A; Schneider, Gisbert
2017-09-01
We have implemented the lecular esign aboratory's nti icrobial eptides package ( ), a Python-based software package for the design, classification and visual representation of peptide data. modlAMP offers functions for molecular descriptor calculation and the retrieval of amino acid sequences from public or local sequence databases, and provides instant access to precompiled datasets for machine learning. The package also contains methods for the analysis and representation of circular dichroism spectra. The modlAMP Python package is available under the BSD license from URL http://doi.org/10.5905/ethz-1007-72 or via pip from the Python Package Index (PyPI). gisbert.schneider@pharma.ethz.ch. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V.; Lawson, Andrew; Zeng, Jia; Johnson, Amber M.; Holla, Vijaykumar; Bailey, Ann M.; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W.
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org PMID:25858285
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework.
Bandrowski, A E; Cachat, J; Li, Y; Müller, H M; Sternberg, P W; Ciccarese, P; Clark, T; Marenco, L; Wang, R; Astakhov, V; Grethe, J S; Martone, M E
2012-01-01
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. DATABASE URL: http://neuinfo.org.
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
Bandrowski, A. E.; Cachat, J.; Li, Y.; Müller, H. M.; Sternberg, P. W.; Ciccarese, P.; Clark, T.; Marenco, L.; Wang, R.; Astakhov, V.; Grethe, J. S.; Martone, M. E.
2012-01-01
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL: http://neuinfo.org PMID:22434839
Wu, Tsung-Jung; Schriml, Lynn M.; Chen, Qing-Rong; Colbert, Maureen; Crichton, Daniel J.; Finney, Richard; Hu, Ying; Kibbe, Warren A.; Kincaid, Heather; Meerzaman, Daoud; Mitraka, Elvira; Pan, Yang; Smith, Krista M.; Srivastava, Sudhir; Ward, Sari; Yan, Cheng; Mazumder, Raja
2015-01-01
Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org PMID:25841438
Aksoy, Bülent Arman; Dančík, Vlado; Smith, Kenneth; Mazerik, Jessica N.; Ji, Zhou; Gross, Benjamin; Nikolova, Olga; Jaber, Nadia; Califano, Andrea; Schreiber, Stuart L.; Gerhard, Daniela S.; Hermida, Leandro C.; Jagu, Subhashini
2017-01-01
Abstract The Cancer Target Discovery and Development (CTD2) Network aims to use functional genomics to accelerate the translation of high-throughput and high-content genomic and small-molecule data towards use in precision oncology. As part of this goal, and to share its conclusions with the research community, the Network developed the ‘CTD2 Dashboard’ [https://ctd2-dashboard.nci.nih.gov/], which compiles CTD2 Network-generated conclusions, termed ‘observations’, associated with experimental entities, collected by its member groups (‘Centers’). Any researcher interested in learning about a given gene, protein, or compound (a ‘subject’) studied by the Network can come to the CTD2 Dashboard to quickly and easily find, review, and understand Network-generated experimental results. In particular, the Dashboard allows visitors to connect experiments about the same target, biomarker, etc., carried out by multiple Centers in the Network. The Dashboard’s unique knowledge representation allows information to be compiled around a subject, so as to become greater than the sum of the individual contributions. The CTD2 Network has broadly defined levels of validation for evidence (‘Tiers’) pertaining to a particular finding, and the CTD2 Dashboard uses these Tiers to indicate the extent to which results have been validated. Researchers can use the Network’s insights and tools to develop a new hypothesis or confirm existing hypotheses, in turn advancing the findings towards clinical applications. Database URL: https://ctd2-dashboard.nci.nih.gov/ PMID:29220450
Academic Research Library as Broker in Addressing Interoperability Challenges for the Geosciences
NASA Astrophysics Data System (ADS)
Smith, P., II
2015-12-01
Data capture is an important process in the research lifecycle. Complete descriptive and representative information of the data or database is necessary during data collection whether in the field or in the research lab. The National Science Foundation's (NSF) Public Access Plan (2015) mandates the need for federally funded projects to make their research data more openly available. Developing, implementing, and integrating metadata workflows into to the research process of the data lifecycle facilitates improved data access while also addressing interoperability challenges for the geosciences such as data description and representation. Lack of metadata or data curation can contribute to (1) semantic, (2) ontology, and (3) data integration issues within and across disciplinary domains and projects. Some researchers of EarthCube funded projects have identified these issues as gaps. These gaps can contribute to interoperability data access, discovery, and integration issues between domain-specific and general data repositories. Academic Research Libraries have expertise in providing long-term discovery and access through the use of metadata standards and provision of access to research data, datasets, and publications via institutional repositories. Metadata crosswalks, open archival information systems (OAIS), trusted-repositories, data seal of approval, persistent URL, linking data, objects, resources, and publications in institutional repositories and digital content management systems are common components in the library discipline. These components contribute to a library perspective on data access and discovery that can benefit the geosciences. The USGS Community for Data Integration (CDI) has developed the Science Support Framework (SSF) for data management and integration within its community of practice for contribution to improved understanding of the Earth's physical and biological systems. The USGS CDI SSF can be used as a reference model to map to EarthCube Funded projects with academic research libraries facilitating the data and information assets components of the USGS CDI SSF via institutional repositories and/or digital content management. This session will explore the USGS CDI SSF for cross-discipline collaboration considerations from a library perspective.
ERIC Educational Resources Information Center
Gunn, Holly
2004-01-01
In this article, the author stresses not to give up on a site when a URL returns an error message. Many web sites can be found by using strategies such as URL trimming, searching cached sites, site searching and searching the WayBack Machine. Methods and tips for finding web sites are contained within this article.
Raising the Degree of Service-Orientation of a SOA-based Software System: A Case Study
2009-12-01
protocols, as well as executable processes that can be compiled into runtime scripts” [2] The Business Process Modeling Notation ( BPMN ) provides a...Notation ( BPMN ) 1.2. Jan. 2009. URL: http://www.omg.org/spec/ BPMN /1.2/ [25] .NET Framework Developer Center. .NET Remoting Overview. 2003. URL: http
Nagel, Anna C; Spitzberg, Brian H; An, Li; Gawron, J Mark; Gupta, Dipak K; Yang, Jiue-An; Han, Su; Peddecord, K Michael; Lindsay, Suzanne; Sawyer, Mark H
2013-01-01
Background Surveillance plays a vital role in disease detection, but traditional methods of collecting patient data, reporting to health officials, and compiling reports are costly and time consuming. In recent years, syndromic surveillance tools have expanded and researchers are able to exploit the vast amount of data available in real time on the Internet at minimal cost. Many data sources for infoveillance exist, but this study focuses on status updates (tweets) from the Twitter microblogging website. Objective The aim of this study was to explore the interaction between cyberspace message activity, measured by keyword-specific tweets, and real world occurrences of influenza and pertussis. Tweets were aggregated by week and compared to weekly influenza-like illness (ILI) and weekly pertussis incidence. The potential effect of tweet type was analyzed by categorizing tweets into 4 categories: nonretweets, retweets, tweets with a URL Web address, and tweets without a URL Web address. Methods Tweets were collected within a 17-mile radius of 11 US cities chosen on the basis of population size and the availability of disease data. Influenza analysis involved all 11 cities. Pertussis analysis was based on the 2 cities nearest to the Washington State pertussis outbreak (Seattle, WA and Portland, OR). Tweet collection resulted in 161,821 flu, 6174 influenza, 160 pertussis, and 1167 whooping cough tweets. The correlation coefficients between tweets or subgroups of tweets and disease occurrence were calculated and trends were presented graphically. Results Correlations between weekly aggregated tweets and disease occurrence varied greatly, but were relatively strong in some areas. In general, correlation coefficients were stronger in the flu analysis compared to the pertussis analysis. Within each analysis, flu tweets were more strongly correlated with ILI rates than influenza tweets, and whooping cough tweets correlated more strongly with pertussis incidence than pertussis tweets. Nonretweets correlated more with disease occurrence than retweets, and tweets without a URL Web address correlated better with actual incidence than those with a URL Web address primarily for the flu tweets. Conclusions This study demonstrates that not only does keyword choice play an important role in how well tweets correlate with disease occurrence, but that the subgroup of tweets used for analysis is also important. This exploratory work shows potential in the use of tweets for infoveillance, but continued efforts are needed to further refine research methods in this field. PMID:24158773
Lo, Chi-Wen; Yang, Stephen Shei-Dei; Hsieh, Cheng-Hsing; Chang, Shang-Jen
2015-08-01
To evaluate the effectiveness of prophylactic antibiotic therapy in reducing the incidence of post-ureteroscopic lithotripsy (URL) infections. A systemic search of PubMED was performed to identify all randomized trials that compared the incidence of post-operative infections in patients without pre-operative urinary tract infections who underwent URL with and without a single dose of prophylactic antibiotics. The data were analyzed using Cochrane Collaboration Review Manager (RevMan, version 5.2). The endpoints of the analysis were pyuria (>10 white blood cells/high-power field), bacteriuria (urine culture with bacteria >10(5) colony-forming units/mL), and febrile urinary tract infections (fUTIs), defined as a body temperature of >38°C with pyuria or meaningful bacteriuria within 1 wk after the operation. In total, four trials enrolling 500 patients met the inclusion criteria and were subjected to meta-analysis. Prophylactic antibiotics significantly reduced post-URL pyuria (risk ratios [RR] 0.65; 95% confidence interval [CI] 0.51-0.82) and bacteriuria (RR 0.26; 95% CI 0.12-0.60; p=0.001). Patients who received prophylactic antibiotics tended to have lower rates of fUTI, although the difference was not statistically significant. Prophylactic antibiotic therapy can reduce the incidence of pyuria and bacteriuria after URL. However, because of the low incidence of post-URL fUTIs, we failed to show that a single dose of prophylactic antibiotics can reduce the rate of such infections significantly.
NASA Astrophysics Data System (ADS)
White, Joshua S.; Matthews, Jeanna N.; Stacy, John L.
2012-06-01
Phishing website analysis is largely still a time-consuming manual process of discovering potential phishing sites, verifying if suspicious sites truly are malicious spoofs and if so, distributing their URLs to the appropriate blacklisting services. Attackers increasingly use sophisticated systems for bringing phishing sites up and down rapidly at new locations, making automated response essential. In this paper, we present a method for rapid, automated detection and analysis of phishing websites. Our method relies on near real-time gathering and analysis of URLs posted on social media sites. We fetch the pages pointed to by each URL and characterize each page with a set of easily computed values such as number of images and links. We also capture a screen-shot of the rendered page image, compute a hash of the image and use the Hamming distance between these image hashes as a form of visual comparison. We provide initial results demonstrate the feasibility of our techniques by comparing legitimate sites to known fraudulent versions from Phishtank.com, by actively introducing a series of minor changes to a phishing toolkit captured in a local honeypot and by performing some initial analysis on a set of over 2.8 million URLs posted to Twitter over a 4 days in August 2011. We discuss the issues encountered during our testing such as resolvability and legitimacy of URL's posted on Twitter, the data sets used, the characteristics of the phishing sites we discovered, and our plans for future work.
NASA Astrophysics Data System (ADS)
Delay, Jacques; Vinsot, Agnès; Krieguer, Jean-Marie; Rebours, Hervé; Armand, Gilles
In November 1999 Andra began building an Underground Research Laboratory (URL) on the border of the Meuse and Haute-Marne departments in eastern France. The research activities of the URL are dedicated to study the feasibility of reversible, deep geological disposal of high-activity, long-lived radioactive wastes in an argillaceous host rock. The Laboratory consists of two shafts, an experimental drift at 445 m depth and a set of technical and experimental drifts at the main level at 490 m depth. The main objective of the research is to characterize the confining properties of the argillaceous rock through in situ hydrogeological tests, chemical measurements and diffusion experiments. In order to achieve this goal, a fundamental understanding of the geoscientific properties and processes that govern geological isolation in clay-rich rocks has been acquired. This understanding includes both the host rocks at the laboratory site and the regional geological context. After establishing the geological conditions, the underground research programme had to demonstrate that the construction and operation of a geological disposal will not introduce pathways for waste migration. Thus, the construction of the laboratory itself serves a research purpose through the monitoring of excavation effects and the optimization of construction technology. These studies are primarily geomechanical in nature, though chemical and hydrogeological coupling also have important roles. In order to achieve the scientific objectives of this project in the underground drifts, a specific methodology has been applied for carrying out the experimental programme conducted concurrently with the construction of the shafts and drifts. This methodology includes technological as well as organizational aspects and a systematic use of feedback from other laboratories abroad and every scientific zone of the URL already installed. This methodology was first applied to set up a multi-purpose experimental area at 445 m depth. Then the setting up of the experimental programme at the level 490 m was improved from the knowledge acquired during installation of the drift at 445 m. The several steps of the underground scientific programme are illustrated by presenting three experiments carried out in the underground drifts. The first experiment was carried out from the drift at 445 m depth, from end of 2004 to mid 2005. This experiment aimed at setting up an array of about 16 boreholes to monitor the geomechanical changes during and after construction of the shaft between 445 and 490 m. The second experiment was set up in the drift at 445 m depth, and also at the main level at 490 m depth. It consisted in determining the composition of the interstitial water by circulating gas in one borehole and water of a known composition in the other. The evolution of the composition of both water and gases enabled us to test the thermodynamic model of the water/rock interactions. The third example is related to the testing of a concept of interruption of the EDZ through a cross-cut slot technology. The concept, which was tested successfully at Mont Terri (Switzerland), has been transposed and adapted to the URL site conditions. The results will be used for developing a concept for drift sealing.
Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ © The Author(s) 2014. Published by Oxford University Press.
Wiegers, Thomas C.; Davis, Allan Peter; Mattingly, Carolyn J.
2014-01-01
The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and disease NER were 61, 74 and 51%, respectively. Response times ranged from fractions-of-a-second to over a minute per article. We present a description of the challenge and summary of results, demonstrating how curation groups can effectively use interoperable NER technologies to simplify text-mining pipeline implementation. Database URL: http://ctdbase.org/ PMID:24919658
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data.
Wilks, Christopher; Cline, Melissa S; Weiler, Erich; Diehkans, Mark; Craft, Brian; Martin, Christy; Murphy, Daniel; Pierce, Howdy; Black, John; Nelson, Donavan; Litzinger, Brian; Hatton, Thomas; Maltbie, Lori; Ainsworth, Michael; Allen, Patrick; Rosewood, Linda; Mitchell, Elizabeth; Smith, Bradley; Warner, Jim; Groboske, John; Telc, Haifang; Wilson, Daniel; Sanford, Brian; Schmidt, Hannes; Haussler, David; Maltbie, Daniel
2014-01-01
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Capturing cooperative interactions with the PSI-MI format
Van Roey, Kim; Orchard, Sandra; Kerrien, Samuel; Dumousseau, Marine; Ricard-Blum, Sylvie; Hermjakob, Henning; Gibson, Toby J.
2013-01-01
The complex biological processes that control cellular function are mediated by intricate networks of molecular interactions. Accumulating evidence indicates that these interactions are often interdependent, thus acting cooperatively. Cooperative interactions are prevalent in and indispensible for reliable and robust control of cell regulation, as they underlie the conditional decision-making capability of large regulatory complexes. Despite an increased focus on experimental elucidation of the molecular details of cooperative binding events, as evidenced by their growing occurrence in literature, they are currently lacking from the main bioinformatics resources. One of the contributing factors to this deficiency is the lack of a computer-readable standard representation and exchange format for cooperative interaction data. To tackle this shortcoming, we added functionality to the widely used PSI-MI interchange format for molecular interaction data by defining new controlled vocabulary terms that allow annotation of different aspects of cooperativity without making structural changes to the underlying XML schema. As a result, we are able to capture cooperative interaction data in a structured format that is backward compatible with PSI-MI–based data and applications. This will facilitate the storage, exchange and analysis of cooperative interaction data, which in turn will advance experimental research on this fundamental principle in biology. Database URL: http://psi-mi-cooperativeinteractions.embl.de/ PMID:24067240
bc-GenExMiner 3.0: new mining module computes breast cancer gene expression correlation analyses.
Jézéquel, Pascal; Frénel, Jean-Sébastien; Campion, Loïc; Guérin-Charbonnel, Catherine; Gouraud, Wilfried; Ricolleau, Gabriel; Campone, Mario
2013-01-01
We recently developed a user-friendly web-based application called bc-GenExMiner (http://bcgenex.centregauducheau.fr), which offered the possibility to evaluate prognostic informativity of genes in breast cancer by means of a 'prognostic module'. In this study, we develop a new module called 'correlation module', which includes three kinds of gene expression correlation analyses. The first one computes correlation coefficient between 2 or more (up to 10) chosen genes. The second one produces two lists of genes that are most correlated (positively and negatively) to a 'tested' gene. A gene ontology (GO) mining function is also proposed to explore GO 'biological process', 'molecular function' and 'cellular component' terms enrichment for the output lists of most correlated genes. The third one explores gene expression correlation between the 15 telomeric and 15 centromeric genes surrounding a 'tested' gene. These correlation analyses can be performed in different groups of patients: all patients (without any subtyping), in molecular subtypes (basal-like, HER2+, luminal A and luminal B) and according to oestrogen receptor status. Validation tests based on published data showed that these automatized analyses lead to results consistent with studies' conclusions. In brief, this new module has been developed to help basic researchers explore molecular mechanisms of breast cancer. DATABASE URL: http://bcgenex.centregauducheau.fr
Cieslewicz, Artur; Dutkiewicz, Jakub; Jedrzejek, Czeslaw
2018-01-01
Abstract Information retrieval from biomedical repositories has become a challenging task because of their increasing size and complexity. To facilitate the research aimed at improving the search for relevant documents, various information retrieval challenges have been launched. In this article, we present the improved medical information retrieval systems designed by Poznan University of Technology and Poznan University of Medical Sciences as a contribution to the bioCADDIE 2016 challenge—a task focusing on information retrieval from a collection of 794 992 datasets generated from 20 biomedical repositories. The system developed by our team utilizes the Terrier 4.2 search platform enhanced by a query expansion method using word embeddings. This approach, after post-challenge modifications and improvements (with particular regard to assigning proper weights for original and expanded terms), allowed us achieving the second best infNDCG measure (0.4539) compared with the challenge results and infAP 0.3978. This demonstrates that proper utilization of word embeddings can be a valuable addition to the information retrieval process. Some analysis is provided on related work involving other bioCADDIE contributions. We discuss the possibility of improving our results by using better word embedding schemes to find candidates for query expansion. Database URL: https://biocaddie.org/benchmark-data PMID:29688372
YummyData: providing high-quality open life science data
Yamaguchi, Atsuko; Splendiani, Andrea
2018-01-01
Abstract Many life science datasets are now available via Linked Data technologies, meaning that they are represented in a common format (the Resource Description Framework), and are accessible via standard APIs (SPARQL endpoints). While this is an important step toward developing an interoperable bioinformatics data landscape, it also creates a new set of obstacles, as it is often difficult for researchers to find the datasets they need. Different providers frequently offer the same datasets, with different levels of support: as well as having more or less up-to-date data, some providers add metadata to describe the content, structures, and ontologies of the stored datasets while others do not. We currently lack a place where researchers can go to easily assess datasets from different providers in terms of metrics such as service stability or metadata richness. We also lack a space for collecting feedback and improving data providers’ awareness of user needs. To address this issue, we have developed YummyData, which consists of two components. One periodically polls a curated list of SPARQL endpoints, monitoring the states of their Linked Data implementations and content. The other presents the information measured for the endpoints and provides a forum for discussion and feedback. YummyData is designed to improve the findability and reusability of life science datasets provided as Linked Data and to foster its adoption. It is freely accessible at http://yummydata.org/. Database URL: http://yummydata.org/ PMID:29688370
PubMed and beyond: a survey of web tools for searching biomedical literature
Lu, Zhiyong
2011-01-01
The past decade has witnessed the modern advances of high-throughput technology and rapid growth of research capacity in producing large-scale biological data, both of which were concomitant with an exponential growth of biomedical literature. This wealth of scholarly knowledge is of significant importance for researchers in making scientific discoveries and healthcare professionals in managing health-related matters. However, the acquisition of such information is becoming increasingly difficult due to its large volume and rapid growth. In response, the National Center for Biotechnology Information (NCBI) is continuously making changes to its PubMed Web service for improvement. Meanwhile, different entities have devoted themselves to developing Web tools for helping users quickly and efficiently search and retrieve relevant publications. These practices, together with maturity in the field of text mining, have led to an increase in the number and quality of various Web tools that provide comparable literature search service to PubMed. In this study, we review 28 such tools, highlight their respective innovations, compare them to the PubMed system and one another, and discuss directions for future development. Furthermore, we have built a website dedicated to tracking existing systems and future advances in the field of biomedical literature search. Taken together, our work serves information seekers in choosing tools for their needs and service providers and developers in keeping current in the field. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search PMID:21245076
Techniques for Efficiently Managing Large Geosciences Data Sets
NASA Astrophysics Data System (ADS)
Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.
2007-12-01
We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the PostgreSQL tables. These and other concepts we have developed have enabled us to efficiently manage a 70 Tb (and growing) data weather radar data set.
Survey of Volumetric Grid Generators
NASA Technical Reports Server (NTRS)
Woo, Alex; Volakis, John; Hulbert, Greg; Case, Jeff; Presley, Leroy L. (Technical Monitor)
1994-01-01
This document is the result of an Internet Survey of Volumetric grid generators. As such we have included information from only the responses which were sent to us. After the initial publication and posting of this survey, we would encourage authors and users of grid generators to send further information. Here is the initial query posted to SIGGRID@nas and the USENET group sci.physics.computational.fluid-dynamics. Date: Sun, 30 Jan 94 11:37:52 -0800 From: woo (Alex Woo x6010 227-6 rm 315) Subject: Info Sought for Survey of Grid Generators I am collecting information and reviews of both government sponsored and commercial mesh generators for large scientific calculations, both block structured and unstructured. If you send me a review of a mesh generator, please indicate its availability and cost. If you are a commercial concern with information on a product, please also include references for possible reviewers. Please email to woo@ra-next.arc.nasa.gov. I will post a summary and probably write a short note for the IEEE Antennas and Propagation Magazine. Alex Woo, MS 227-6 woo@ames.arc.nasa.gov NASA Ames Research Center NASAMAIL ACWOO Moffett Field, CA 94035-1000 SPANET 24582::W00 (415) 604-6010 (FAX) 604-4357 fhplabs,decwrl,uunet)!ames!woo Disclaimer: These are not official statements of NASA or EMCC. We did not include all the submitted text here. Instead we have created a database entry in the freely available and widely used BIBTeX format which has an Uniform Resource Locator (URL) field pointing to more details. The BIBTeX database is modeled after those available from the BIBNET project at University of Utah.
Publications of Western Earth Surface Processes Team 2001
Powell, II; Graymer, R.W.
2002-01-01
The Western Earth Surface Processes Team (WESPT) of the U.S. Geological Survey (USGS) conducts geologic mapping and related topical earth-science studies in the Western United States. This work is focused on areas where modern geologic maps and associated earth-science data are needed to address key societal and environmental issues, such as ground-water quality, landslides and other potential geologic hazards, and land-use decisions. Areas of primary emphasis in 2001 included southern California, the San Francisco Bay region, the Pacific Northwest, and the Las Vegas urban corridor. The team has its headquarters in Menlo Park, California, and maintains smaller field offices at several other locations in the Western United States. The results of research conducted by the WESPT are released to the public as a variety of databases, maps, text reports, and abstracts, both through the internal publication system of the USGS and in diverse external publications such as scientific journals and books. This report lists publications of the WESPT released in 2001, as well as additional 1999 and 2000 publications that were not included in the previous list (USGS Open-File Report 00–215 and USGS Open-File Report 01–198). Most of the publications listed were authored or coauthored by WESPT staff. The list also includes some publications authored by non-USGS cooperators with the WESPT, as well as some authored by USGS staff outside the WESPT in cooperation with WESPT projects. Several of the publications listed are available on the World Wide Web; for these, URL addresses are provided. Many of these web publications are USGS Open-File Reports that contain large digital databases of geologic map and related information.
Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.
Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie
2013-01-01
Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.
Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases
Sanderson, Lacey-Anne; Ficklin, Stephen P.; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A.; Bett, Kirstin E.; Main, Dorrie
2013-01-01
Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/ PMID:24163125
LDSplitDB: a database for studies of meiotic recombination hotspots in MHC using human genomic data.
Guo, Jing; Chen, Hao; Yang, Peng; Lee, Yew Ti; Wu, Min; Przytycka, Teresa M; Kwoh, Chee Keong; Zheng, Jie
2018-04-20
Meiotic recombination happens during the process of meiosis when chromosomes inherited from two parents exchange genetic materials to generate chromosomes in the gamete cells. The recombination events tend to occur in narrow genomic regions called recombination hotspots. Its dysregulation could lead to serious human diseases such as birth defects. Although the regulatory mechanism of recombination events is still unclear, DNA sequence polymorphisms have been found to play crucial roles in the regulation of recombination hotspots. To facilitate the studies of the underlying mechanism, we developed a database named LDSplitDB which provides an integrative and interactive data mining and visualization platform for the genome-wide association studies of recombination hotspots. It contains the pre-computed association maps of the major histocompatibility complex (MHC) region in the 1000 Genomes Project and the HapMap Phase III datasets, and a genome-scale study of the European population from the HapMap Phase II dataset. Besides the recombination profiles, related data of genes, SNPs and different types of epigenetic modifications, which could be associated with meiotic recombination, are provided for comprehensive analysis. To meet the computational requirement of the rapidly increasing population genomics data, we prepared a lookup table of 400 haplotypes for recombination rate estimation using the well-known LDhat algorithm which includes all possible two-locus haplotype configurations. To the best of our knowledge, LDSplitDB is the first large-scale database for the association analysis of human recombination hotspots with DNA sequence polymorphisms. It provides valuable resources for the discovery of the mechanism of meiotic recombination hotspots. The information about MHC in this database could help understand the roles of recombination in human immune system. DATABASE URL: http://histone.scse.ntu.edu.sg/LDSplitDB.
López, Yosvany; Nakai, Kenta; Patil, Ashwini
2015-01-01
HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.
Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Ben Marzoug, Riadh; Rebai, Ahmed
2016-01-01
Olive (Olea europaea), whose importance is mainly due to nutritional and health features, is one of the most economically significant oil-producing trees in the Mediterranean region. Unfortunately, the increasing market demand towards virgin olive oil could often result in its adulteration with less expensive oils, which is a serious problem for the public and quality control evaluators of virgin olive oil. Therefore, to avoid frauds, olive cultivar identification and virgin olive oil authentication have become a major issue for the producers and consumers of quality control in the olive chain. Presently, genetic traceability using SSR is the cost effective and powerful marker technique that can be employed to resolve such problems. However, to identify an unknown monovarietal virgin olive oil cultivar, a reference system has become necessary. Thus, an Olive Genetic Diversity Database (OGDD) (http://www.bioinfo-cbs.org/ogdd/) is presented in this work. It is a genetic, morphologic and chemical database of worldwide olive tree and oil having a double function. In fact, besides being a reference system generated for the identification of unkown olive or virgin olive oil cultivars based on their microsatellite allele size(s), it provides users additional morphological and chemical information for each identified cultivar. Currently, OGDD is designed to enable users to easily retrieve and visualize biologically important information (SSR markers, and olive tree and oil characteristics of about 200 cultivars worldwide) using a set of efficient query interfaces and analysis tools. It can be accessed through a web service from any modern programming language using a simple hypertext transfer protocol call. The web site is implemented in java, JavaScript, PHP, HTML and Apache with all major browsers supported. Database URL: http://www.bioinfo-cbs.org/ogdd/ PMID:26827236
Design and implementation of a database for Brucella melitensis genome annotation.
De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric
2008-03-18
The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.
Incorporating the APS Catalog of the POSS I and Image Archive in ADS
NASA Technical Reports Server (NTRS)
Humphreys, Roberta M.
1998-01-01
The primary purpose of this contract was to develop the software to both create and access an on-line database of images from digital scans of the Palomar Sky Survey. This required modifying our DBMS (called Star Base) to create an image database from the actual raw pixel data from the scans. The digitized images are processed into a set of coordinate-reference index and pixel files that are stored in run-length files, thus achieving an efficient lossless compression. For efficiency and ease of referencing, each digitized POSS I plate is then divided into 900 subplates. Our custom DBMS maps each query into the corresponding POSS plate(s) and subplate(s). All images from the appropriate subplates are retrieved from disk with byte-offsets taken from the index files. These are assembled on-the-fly into a GIF image file for browser display, and a FITS format image file for retrieval. The FITS images have a pixel size of 0.33 arcseconds. The FITS header contains astrometric and photometric information. This method keeps the disk requirements manageable while allowing for future improvements. When complete, the APS Image Database will contain over 130 Gb of data. A set of web pages query forms are available on-line, as well as an on-line tutorial and documentation. The database is distributed to the Internet by a high-speed SGI server and a high-bandwidth disk system. URL is http://aps.umn.edu/IDB/. The image database software is written in perl and C and has been compiled on SGI computers with MIX5.3. A copy of the written documentation is included and the software is on the accompanying exabyte tape.
The aquatic animals' transcriptome resource for comparative functional analysis.
Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da
2018-05-09
Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .
Boué, Stéphanie; Talikka, Marja; Westra, Jurjen Willem; Hayes, William; Di Fabio, Anselmo; Park, Jennifer; Schlage, Walter K; Sewer, Alain; Fields, Brett; Ansari, Sam; Martin, Florian; Veljkovic, Emilija; Kenney, Renee; Peitsch, Manuel C; Hoeng, Julia
2015-01-01
With the wealth of publications and data available, powerful and transparent computational approaches are required to represent measured data and scientific knowledge in a computable and searchable format. We developed a set of biological network models, scripted in the Biological Expression Language, that reflect causal signaling pathways across a wide range of biological processes, including cell fate, cell stress, cell proliferation, inflammation, tissue repair and angiogenesis in the pulmonary and cardiovascular context. This comprehensive collection of networks is now freely available to the scientific community in a centralized web-based repository, the Causal Biological Network database, which is composed of over 120 manually curated and well annotated biological network models and can be accessed at http://causalbionet.com. The website accesses a MongoDB, which stores all versions of the networks as JSON objects and allows users to search for genes, proteins, biological processes, small molecules and keywords in the network descriptions to retrieve biological networks of interest. The content of the networks can be visualized and browsed. Nodes and edges can be filtered and all supporting evidence for the edges can be browsed and is linked to the original articles in PubMed. Moreover, networks may be downloaded for further visualization and evaluation. Database URL: http://causalbionet.com © The Author(s) 2015. Published by Oxford University Press.
Davis, Allan Peter; Wiegers, Thomas C.; Murphy, Cynthia G.; Mattingly, Carolyn J.
2011-01-01
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the effects of environmental chemicals on human health. CTD biocurators read the scientific literature and convert free-text information into a structured format using official nomenclature, integrating third party controlled vocabularies for chemicals, genes, diseases and organisms, and a novel controlled vocabulary for molecular interactions. Manual curation produces a robust, richly annotated dataset of highly accurate and detailed information. Currently, CTD describes over 349 000 molecular interactions between 6800 chemicals, 20 900 genes (for 330 organisms) and 4300 diseases that have been manually curated from over 25 400 peer-reviewed articles. This manually curated data are further integrated with other third party data (e.g. Gene Ontology, KEGG and Reactome annotations) to generate a wealth of toxicogenomic relationships. Here, we describe our approach to manual curation that uses a powerful and efficient paradigm involving mnemonic codes. This strategy allows biocurators to quickly capture detailed information from articles by generating simple statements using codes to represent the relationships between data types. The paradigm is versatile, expandable, and able to accommodate new data challenges that arise. We have incorporated this strategy into a web-based curation tool to further increase efficiency and productivity, implement quality control in real-time and accommodate biocurators working remotely. Database URL: http://ctd.mdibl.org PMID:21933848
Schoch, Conrad L; Robbertse, Barbara; Robert, Vincent; Vu, Duong; Cardinali, Gianluigi; Irinyi, Laszlo; Meyer, Wieland; Nilsson, R Henrik; Hughes, Karen; Miller, Andrew N; Kirk, Paul M; Abarenkov, Kessy; Aime, M Catherine; Ariyawansa, Hiran A; Bidartondo, Martin; Boekhout, Teun; Buyck, Bart; Cai, Qing; Chen, Jie; Crespo, Ana; Crous, Pedro W; Damm, Ulrike; De Beer, Z Wilhelm; Dentinger, Bryn T M; Divakar, Pradeep K; Dueñas, Margarita; Feau, Nicolas; Fliegerova, Katerina; García, Miguel A; Ge, Zai-Wei; Griffith, Gareth W; Groenewald, Johannes Z; Groenewald, Marizeth; Grube, Martin; Gryzenhout, Marieka; Gueidan, Cécile; Guo, Liangdong; Hambleton, Sarah; Hamelin, Richard; Hansen, Karen; Hofstetter, Valérie; Hong, Seung-Beom; Houbraken, Jos; Hyde, Kevin D; Inderbitzin, Patrik; Johnston, Peter R; Karunarathna, Samantha C; Kõljalg, Urmas; Kovács, Gábor M; Kraichak, Ekaphan; Krizsan, Krisztina; Kurtzman, Cletus P; Larsson, Karl-Henrik; Leavitt, Steven; Letcher, Peter M; Liimatainen, Kare; Liu, Jian-Kui; Lodge, D Jean; Luangsa-ard, Janet Jennifer; Lumbsch, H Thorsten; Maharachchikumbura, Sajeewa S N; Manamgoda, Dimuthu; Martín, María P; Minnis, Andrew M; Moncalvo, Jean-Marc; Mulè, Giuseppina; Nakasone, Karen K; Niskanen, Tuula; Olariaga, Ibai; Papp, Tamás; Petkovits, Tamás; Pino-Bodas, Raquel; Powell, Martha J; Raja, Huzefa A; Redecker, Dirk; Sarmiento-Ramirez, J M; Seifert, Keith A; Shrestha, Bhushan; Stenroos, Soili; Stielow, Benjamin; Suh, Sung-Oui; Tanaka, Kazuaki; Tedersoo, Leho; Telleria, M Teresa; Udayanga, Dhanushka; Untereiner, Wendy A; Diéguez Uribeondo, Javier; Subbarao, Krishna V; Vágvölgyi, Csaba; Visagie, Cobus; Voigt, Kerstin; Walker, Donald M; Weir, Bevan S; Weiß, Michael; Wijayawardene, Nalin N; Wingfield, Michael J; Xu, J P; Yang, Zhu L; Zhang, Ning; Zhuang, Wen-Ying; Federhen, Scott
2014-01-01
DNA phylogenetic comparisons have shown that morphology-based species recognition often underestimates fungal diversity. Therefore, the need for accurate DNA sequence data, tied to both correct taxonomic names and clearly annotated specimen data, has never been greater. Furthermore, the growing number of molecular ecology and microbiome projects using high-throughput sequencing require fast and effective methods for en masse species assignments. In this article, we focus on selecting and re-annotating a set of marker reference sequences that represent each currently accepted order of Fungi. The particular focus is on sequences from the internal transcribed spacer region in the nuclear ribosomal cistron, derived from type specimens and/or ex-type cultures. Re-annotated and verified sequences were deposited in a curated public database at the National Center for Biotechnology Information (NCBI), namely the RefSeq Targeted Loci (RTL) database, and will be visible during routine sequence similarity searches with NR_prefixed accession numbers. A set of standards and protocols is proposed to improve the data quality of new sequences, and we suggest how type and other reference sequences can be used to improve identification of Fungi. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353. Published by Oxford University Press 2013. This work is written by US Government employees and is in the public domain in the US.
2007-09-01
Motion URL: http://www.blackberry.com/products/blackberry/index.shtml Software Name: Bricolage Company: Bricolage URL: http://www.bricolage.cc...Workflow Customizable control over editorial content. Bricolage Bricolage Feature Description Software Company Workflow Allows development...content for Nuxeo Collaborative Portal projects. Nuxeo Workspace Add, edit, delete, content through web interface. Bricolage Bricolage
40 CFR 53.23 - Test procedures.
Code of Federal Regulations, 2013 CFR
2013-07-01
... up and stabilize. Determine measurement noise at each of two fixed concentrations, first using zero.... Note to § 53.23(b)(2): Use of a chart recorder in addition to the DM is optional. (iii) Measure zero... atmosphere concentration of 80 ±5 percent of the URL instead of zero air, and let S at 80 percent of the URL...
40 CFR 53.23 - Test procedures.
Code of Federal Regulations, 2012 CFR
2012-07-01
... up and stabilize. Determine measurement noise at each of two fixed concentrations, first using zero.... Note to § 53.23(b)(2): Use of a chart recorder in addition to the DM is optional. (iii) Measure zero... atmosphere concentration of 80 ±5 percent of the URL instead of zero air, and let S at 80 percent of the URL...
40 CFR 53.23 - Test procedures.
Code of Federal Regulations, 2014 CFR
2014-07-01
... up and stabilize. Determine measurement noise at each of two fixed concentrations, first using zero.... Note to § 53.23(b)(2): Use of a chart recorder in addition to the DM is optional. (iii) Measure zero... atmosphere concentration of 80 ±5 percent of the URL instead of zero air, and let S at 80 percent of the URL...
Portable Electromagnetic Induction Sensor with Integrated Positioning
2013-08-20
Subsurface electromagnetic induction imaging for unexploded ordnance detection. Journal of Applied Geophysics, 79:38 – 45, 2012. ISSN 09269851. URL http...Portable Electromagnetic Induction Sensor with Integrated Positioning MR-1712 Final Report Submitted to Strategic Environmental Research and...19a. NAME OF RESPONSIBLE PERSON 19b. TELEPHONE NUMBER (include area code) Standard Form 298 (Rev. 8–98) Prescribed by ANSI Std. Z39.18 06–25–2013
DePriest, Adam D; Fiandalo, Michael V; Schlanger, Simon; Heemers, Frederike; Mohler, James L; Liu, Song; Heemers, Hannelore V
2016-01-01
Androgen receptor (AR) is a ligand-activated transcription factor that is the main target for treatment of non-organ-confined prostate cancer (CaP). Failure of life-prolonging AR-targeting androgen deprivation therapy is due to flexibility in steroidogenic pathways that control intracrine androgen levels and variability in the AR transcriptional output. Androgen biosynthesis enzymes, androgen transporters and AR-associated coregulators are attractive novel CaP treatment targets. These proteins, however, are characterized by multiple transcript variants and isoforms, are subject to genomic alterations, and are differentially expressed among CaPs. Determining their therapeutic potential requires evaluation of extensive, diverse datasets that are dispersed over multiple databases, websites and literature reports. Mining and integrating these datasets are cumbersome, time-consuming tasks and provide only snapshots of relevant information. To overcome this impediment to effective, efficient study of AR and potential drug targets, we developed the Regulators of Androgen Action Resource (RAAR), a non-redundant, curated and user-friendly searchable web interface. RAAR centralizes information on gene function, clinical relevance, and resources for 55 genes that encode proteins involved in biosynthesis, metabolism and transport of androgens and for 274 AR-associated coregulator genes. Data in RAAR are organized in two levels: (i) Information pertaining to production of androgens is contained in a 'pre-receptor level' database, and coregulator gene information is provided in a 'post-receptor level' database, and (ii) an 'other resources' database contains links to additional databases that are complementary to and useful to pursue further the information provided in RAAR. For each of its 329 entries, RAAR provides access to more than 20 well-curated publicly available databases, and thus, access to thousands of data points. Hyperlinks provide direct access to gene-specific entries in the respective database(s). RAAR is a novel, freely available resource that provides fast, reliable and easy access to integrated information that is needed to develop alternative CaP therapies. Database URL: http://www.lerner.ccf.org/cancerbio/heemers/RAAR/search/. © The Author(s) 2016. Published by Oxford University Press.
PGP repository: a plant phenomics and genomics data publication infrastructure
Arend, Daniel; Junker, Astrid; Scholz, Uwe; Schüler, Danuta; Wylie, Juliane; Lange, Matthias
2016-01-01
Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable, open-access portal. Although several international consortia and public resource centres offer services for plant research data management, valuable digital assets remains unpublished and thus inaccessible to the scientific community. Recently, the Leibniz Institute of Plant Genetics and Crop Plant Research and the German Plant Phenotyping Network have jointly initiated the Plant Genomics and Phenomics Research Data Repository (PGP) as infrastructure to comprehensively publish plant research data. This covers in particular cross-domain datasets that are not being published in central repositories because of its volume or unsupported data scope, like image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents. The repository is hosted at Leibniz Institute of Plant Genetics and Crop Plant Research using e!DAL as software infrastructure and a Hierarchical Storage Management System as data archival backend. A novel developed data submission tool was made available for the consortium that features a high level of automation to lower the barriers of data publication. After an internal review process, data are published as citable digital object identifiers and a core set of technical metadata is registered at DataCite. The used e!DAL-embedded Web frontend generates for each dataset a landing page and supports an interactive exploration. PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive. Above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. Database URL: http://edal.ipk-gatersleben.de/repos/pgp/ PMID:27087305
firestar--advances in the prediction of functionally important residues.
Lopez, Gonzalo; Maietta, Paolo; Rodriguez, Jose Manuel; Valencia, Alfonso; Tress, Michael L
2011-07-01
firestar is a server for predicting catalytic and ligand-binding residues in protein sequences. Here, we present the important developments since the first release of firestar. Previous versions of the server required human interpretation of the results; the server is now fully automatized. firestar has been implemented as a web service and can now be run in high-throughput mode. Prediction coverage has been greatly improved with the extension of the FireDB database and the addition of alignments generated by HHsearch. Ligands in FireDB are now classified for biological relevance. Many of the changes have been motivated by the critical assessment of techniques for protein structure prediction (CASP) ligand-binding prediction experiment, which provided us with a framework to test the performance of firestar. URL: http://firedb.bioinfo.cnio.es/Php/FireStar.php.
Principles of metadata organization at the ENCODE data coordination center.
Hong, Eurie L; Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Malladi, Venkat S; Strattan, J Seth; Hitz, Benjamin C; Gabdank, Idan; Narayanan, Aditi K; Ho, Marcus; Lee, Brian T; Rowe, Laurence D; Dreszer, Timothy R; Roe, Greg R; Podduturi, Nikhil R; Tanaka, Forrest; Hilton, Jason A; Cherry, J Michael
2016-01-01
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org. © The Author(s) 2016. Published by Oxford University Press.
Comeau, Donald C.; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W. John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net PMID:24935050
The Meteoritical Bulletin, No. 103
NASA Astrophysics Data System (ADS)
Ruzicka, Alex; Grossman, Jeffrey; Bouvier, Audrey; Agee, Carl B.
2017-05-01
Meteoritical Bulletin 103 contains 2582 meteorites including 10 falls (Ardón, Demsa, Jinju, Križevci, Kuresoi, Novato, Tinajdad, Tirhert, Vicência, Wolcott), with 2174 ordinary chondrites, 130 HED achondrites, 113 carbonaceous chondrites, 41 ureilites, 27 lunar meteorites, 24 enstatite chondrites, 21 iron meteorites, 15 primitive achondrites, 11 mesosiderites, 10 Martian meteorites, 6 Rumuruti chondrites, 5 ungrouped achondrites, 2 enstatite achondrites, 1 relict meteorite, 1 pallasite, and 1 angrite, and with 1511 from Antarctica, 588 from Africa, 361 from Asia, 86 from South America, 28 from North America, and 6 from Europe. Note: 1 meteorite from Russia was counted as European. The complete contents of this bulletin (244 pages) are available on line. Information about approved meteorites can be obtained from the Meteoritical Bulletin Database (MBD) available on line at
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature.
Chen, Guocai; Zhao, Jieyi; Cohen, Trevor; Tao, Cui; Sun, Jingchun; Xu, Hua; Bernstam, Elmer V; Lawson, Andrew; Zeng, Jia; Johnson, Amber M; Holla, Vijaykumar; Bailey, Ann M; Lara-Guerra, Humberto; Litzenburger, Beate; Meric-Bernstam, Funda; Jim Zheng, W
2015-01-01
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org © The Author(s) 2015. Published by Oxford University Press.
firestar—advances in the prediction of functionally important residues
Lopez, Gonzalo; Maietta, Paolo; Rodriguez, Jose Manuel; Valencia, Alfonso; Tress, Michael L.
2011-01-01
firestar is a server for predicting catalytic and ligand-binding residues in protein sequences. Here, we present the important developments since the first release of firestar. Previous versions of the server required human interpretation of the results; the server is now fully automatized. firestar has been implemented as a web service and can now be run in high-throughput mode. Prediction coverage has been greatly improved with the extension of the FireDB database and the addition of alignments generated by HHsearch. Ligands in FireDB are now classified for biological relevance. Many of the changes have been motivated by the critical assessment of techniques for protein structure prediction (CASP) ligand-binding prediction experiment, which provided us with a framework to test the performance of firestar. URL: http://firedb.bioinfo.cnio.es/Php/FireStar.php. PMID:21672959
myPhyloDB: a local web server for the storage and analysis of metagenomic data.
Manter, Daniel K; Korsa, Matthew; Tebbe, Caleb; Delgado, Jorge A
2016-01-01
myPhyloDB v.1.1.2 is a user-friendly personal database with a browser-interface designed to facilitate the storage, processing, analysis, and distribution of microbial community populations (e.g. 16S metagenomics data). MyPhyloDB archives raw sequencing files, and allows for easy selection of project(s)/sample(s) of any combination from all available data in the database. The data processing capabilities of myPhyloDB are also flexible enough to allow the upload and storage of pre-processed data, or use the built-in Mothur pipeline to automate the processing of raw sequencing data. myPhyloDB provides several analytical (e.g. analysis of covariance,t-tests, linear regression, differential abundance (DESeq2), and principal coordinates analysis (PCoA)) and normalization (rarefaction, DESeq2, and proportion) tools for the comparative analysis of taxonomic abundance, species richness and species diversity for projects of various types (e.g. human-associated, human gut microbiome, air, soil, and water) for any taxonomic level(s) desired. Finally, since myPhyloDB is a local web-server, users can quickly distribute data between colleagues and end-users by simply granting others access to their personal myPhyloDB database. myPhyloDB is available athttp://www.ars.usda.gov/services/software/download.htm?softwareid=472 and more information along with tutorials can be found on our websitehttp://www.myphylodb.org. Database URL:http://www.myphylodb.org. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
Software for Building Models of 3D Objects via the Internet
NASA Technical Reports Server (NTRS)
Schramer, Tim; Jensen, Jeff
2003-01-01
The Virtual EDF Builder (where EDF signifies Electronic Development Fixture) is a computer program that facilitates the use of the Internet for building and displaying digital models of three-dimensional (3D) objects that ordinarily comprise assemblies of solid models created previously by use of computer-aided-design (CAD) programs. The Virtual EDF Builder resides on a Unix-based server computer. It is used in conjunction with a commercially available Web-based plug-in viewer program that runs on a client computer. The Virtual EDF Builder acts as a translator between the viewer program and a database stored on the server. The translation function includes the provision of uniform resource locator (URL) links to other Web-based computer systems and databases. The Virtual EDF builder can be used in two ways: (1) If the client computer is Unix-based, then it can assemble a model locally; the computational load is transferred from the server to the client computer. (2) Alternatively, the server can be made to build the model, in which case the server bears the computational load and the results are downloaded to the client computer or workstation upon completion.
DEOP: a database on osmoprotectants and associated pathways
Bougouffa, Salim; Radovanovic, Aleksandar; Essack, Magbubah; Bajic, Vladimir B.
2014-01-01
Microorganisms are known to counteract salt stress through salt influx or by the accumulation of osmoprotectants (also called compatible solutes). Understanding the pathways that synthesize and/or breakdown these osmoprotectants is of interest to studies of crops halotolerance and to biotechnology applications that use microbes as cell factories for production of biomass or commercial chemicals. To facilitate the exploration of osmoprotectants, we have developed the first online resource, ‘Dragon Explorer of Osmoprotection associated Pathways’ (DEOP) that gathers and presents curated information about osmoprotectants, complemented by information about reactions and pathways that use or affect them. A combined total of 141 compounds were confirmed osmoprotectants, which were matched to 1883 reactions and 834 pathways. DEOP can also be used to map genes or microbial genomes to potential osmoprotection-associated pathways, and thus link genes and genomes to other associated osmoprotection information. Moreover, DEOP provides a text-mining utility to search deeper into the scientific literature for supporting evidence or for new associations of osmoprotectants to pathways, reactions, enzymes, genes or organisms. Two case studies are provided to demonstrate the usefulness of DEOP. The system can be accessed at. Database URL: http://www.cbrc.kaust.edu.sa/deop/ PMID:25326239
EMAGE mouse embryo spatial gene expression database: 2010 update
Richardson, Lorna; Venkataraman, Shanmugasundaram; Stevenson, Peter; Yang, Yiya; Burton, Nicholas; Rao, Jianguo; Fisher, Malcolm; Baldock, Richard A.; Davidson, Duncan R.; Christiansen, Jeffrey H.
2010-01-01
EMAGE (http://www.emouseatlas.org/emage) is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Here, we describe recent enhancements to EMAGE including: the release of a completely re-designed website, which offers integration of many different search functions in HTML web pages, improved user feedback and the ability to find similar expression patterns at the click of a button; back-end refactoring from an object oriented to relational architecture, allowing associated SQL access; and the provision of further access by standard formatted URLs and a Java API. We have also increased data coverage by sourcing from a greater selection of journals and developed automated methods for spatial data annotation that are being applied to spatially incorporate the genome-wide (∼19 000 gene) ‘EURExpress’ dataset into EMAGE. PMID:19767607
Access To The PMM's Pixel Database
NASA Astrophysics Data System (ADS)
Monet, D.; Levine, S.
1999-12-01
The U.S. Naval Observatory Flagstaff Station is in the process of enabling access to the Precision Measuring Machine (PMM) program's pixel database. The initial release will include the pixels from the PMM's scans of the Palomar Observatory Sky Survey I (POSS-I) -O and -E surveys, the Whiteoak Extension, the European Southern Observatory-R survey, the Science and Engineering Council-J, -EJ, and -ER surveys, and the Anglo- Australian Observatory-R survey. (The SERC-ER and AAO-R surveys are currently incomplete.) As time allows, access to the POSS-II -J, -F, and -N surveys, the Palomar Infrared Milky Way Atlas, the Yale/San Juan Southern Proper Motion survey, and plates rejected by various surveys will be added. (POSS-II -J and -F are complete, but -N was never finished.) Eventually, some 10 Tbytes of pixel data will be available. Due to funding and technology limitations, the initial interface will have only limited functionality, and access time will be slow since the archive is stored on Digital Linear Tape (DLT). Usage of the pixel data will be restricted to non-commercial, scientific applications, and agreements on copyright issues have yet to be finalized. The poster presentation will give the URL.
Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks.
Balaur, Irina; Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J; Auffray, Charles
2017-04-01
The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . ibalaur@eisbm.org. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks
Mazein, Alexander; Saqi, Mansoor; Lysenko, Artem; Rawlings, Christopher J.; Auffray, Charles
2017-01-01
Abstract Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/. The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework. Contact: ibalaur@eisbm.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27993779
Fresnel, A; Jarno, P; Burgun, A; Delamarre, D; Denier, P; Cleret, M; Courtin, C; Seka, L P; Pouliquen, B; Cléran, L; Riou, C; Leduff, F; Lesaux, H; Duvauferrier, R; Le Beux, P
1998-01-01
A pedagogical network has been developed at University Hospital of Rennes from 1996. The challenge is to give medical information and informatics tools to all medical students in the clinical wards of the University Hospital. At first, nine wards were connected to the medical school server which is linked to the Internet. Client software electronic mail and WWW Netscape on Macintosh computers. Sever software is set up on Unix SUN providing a local homepage with selected pedagogical resources. These documents are stored in a DBMS database ORACLE and queries can be provided by specialty, authors or disease. The students can access a set of interactive teaching programs or electronic textbooks and can explore the Internet through the library information system and search engines. The teachers can send URL and indexation of pedagogical documents and can produce clinical cases: the database updating will be done by the users. This experience of using Web tools generated enthusiasm when we first introduced it to students. The evaluation shows that if the students can use this training early on, they will adapt the resources of the Internet to their own needs.
GeneStoryTeller: a mobile app for quick and comprehensive information retrieval of human genes
Eleftheriou, Stergiani V.; Bourdakou, Marilena M.; Athanasiadis, Emmanouil I.; Spyrou, George M.
2015-01-01
In the last few years, mobile devices such as smartphones and tablets have become an integral part of everyday life, due to their software/hardware rapid development, as well as the increased portability they offer. Nevertheless, up to now, only few Apps have been developed in the field of bioinformatics, capable to perform fast and robust access to services. We have developed the GeneStoryTeller, a mobile application for Android platforms, where users are able to instantly retrieve information regarding any recorded human gene, derived from eight publicly available databases, as a summary story. Complementary information regarding gene–drugs interactions, functional annotation and disease associations for each selected gene is also provided in the gene story. The most challenging part during the development of the GeneStoryTeller was to keep balance between storing data locally within the app and obtaining the updated content dynamically via a network connection. This was accomplished with the implementation of an administrative site where data are curated and synchronized with the application requiring a minimum human intervention. Database URL: http://bioserver-3.bioacademy.gr/Bioserver/GeneStoryTeller/. PMID:26055097
DIRT: The Dust InfraRed Toolbox
NASA Astrophysics Data System (ADS)
Pound, M. W.; Wolfire, M. G.; Mundy, L. G.; Teuben, P. J.; Lord, S.
We present DIRT, a Java applet geared toward modeling a variety of processes in envelopes of young and evolved stars. Users can automatically and efficiently search grids of pre-calculated models to fit their data. A large set of physical parameters and dust types are included in the model database, which contains over 500,000 models. The computing cluster for the database is described in the accompanying paper by Teuben et al. (2000). A typical user query will return about 50-100 models, which the user can then interactively filter as a function of 8 model parameters (e.g., extinction, size, flux, luminosity). A flexible, multi-dimensional plotter (Figure 1) allows users to view the models, rotate them, tag specific parameters with color or symbol size, and probe individual model points. For any given model, auxiliary plots such as dust grain properties, radial intensity profiles, and the flux as a function of wavelength and beamsize can be viewed. The user can fit observed data to several models simultaneously and see the results of the fit; the best fit is automatically selected for plotting. The URL for this project is http://dustem.astro.umd.edu.
Reefgenomics.Org - a repository for marine genomics data.
Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R
2016-01-01
Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data. DATABASE URL: http://reefgenomics.org. © The Author(s) 2016. Published by Oxford University Press.
A resource oriented webs service for environmental modeling
NASA Astrophysics Data System (ADS)
Ferencik, Ioan
2013-04-01
Environmental modeling is a largely adopted practice in the study of natural phenomena. Environmental models can be difficult to build and use and thus sharing them within the community is an important aspect. The most common approach to share a model is to expose it as a web service. In practice the interaction with this web service is cumbersome due to lack of standardized contract and the complexity of the model being exposed. In this work we investigate the use of a resource oriented approach in exposing environmental models as web services. We view a model as a layered resource build atop the object concept from Object Oriented Programming, augmented with persistence capabilities provided by an embedded object database to keep track of its state and implementing the four basic principles of resource oriented architectures: addressability, statelessness, representation and uniform interface. For implementation we use exclusively open source software: Django framework, dyBase object oriented database and Python programming language. We developed a generic framework of resources structured into a hierarchy of types and consequently extended this typology with recurses specific to the domain of environmental modeling. To test our web service we used cURL, a robust command-line based web client.
Use Them ... or Lose Them? The Case for and against Using QR Codes
ERIC Educational Resources Information Center
Cunningham, Chuck; Dull, Cassie
2011-01-01
A quick-response (QR) code is a two-dimensional, black-and-white square barcode and links directly to a URL of one's choice. When the code is scanned with a smartphone, it will automatically redirect the user to the designated URL. QR codes are popping up everywhere--billboards, magazines, posters, shop windows, TVs, computer screens, and more.…
Measuring Link-Resolver Success: Comparing 360 Link with a Local Implementation of WebBridge
ERIC Educational Resources Information Center
Herrera, Gail
2011-01-01
This study reviewed link resolver success comparing 360 Link and a local implementation of WebBridge. Two methods were used: (1) comparing article-level access and (2) examining technical issues for 384 randomly sampled OpenURLs. Google Analytics was used to collect user-generated OpenURLs. For both methods, 360 Link out-performed the local…
ERIC Educational Resources Information Center
Brown, Christopher C.
2011-01-01
As federal government information is increasingly migrating to online formats, libraries are providing links to this content via URLs or persistent URLs (PURLs) in their online public access catalogs (OPACs). Clickthrough statistics that accumulated as users visited links to online content in the University of Denver's library OPAC were gathered…
D'Aurizio, F; Metus, P; Ferrari, A; Caruso, B; Castello, R; Villalta, D; Steffan, A; Gaspardo, K; Pesente, F; Bizzaro, N; Tonutti, E; Valverde, S; Cosma, C; Plebani, M; Tozzoli, R
2017-12-01
In the last two decades, thyroglobulin autoantibodies (TgAb) measurement has progressively switched from marker of thyroid autoimmunity to test associated with thyroglobulin (Tg) to verify the presence or absence of TgAb interference in the follow-up of patients with differentiated thyroid cancer. Of note, TgAb measurement is cumbersome: despite standardization against the International Reference Preparation MRC 65/93, several studies demonstrated high inter-method variability and wide variation in limits of detection and in reference intervals. Taking into account the above considerations, the main aim of the present study was the determination of TgAb upper reference limit (URL), according to the National Academy of Clinical Biochemistry guidelines, through the comparison of eleven commercial automated immunoassay platforms. The sera of 120 healthy males, selected from a population survey in the province of Verona, Italy, were tested for TgAb concentration using eleven IMA applied on as many automated analyzers: AIA-2000 (AIA) and AIA-CL2400 (CL2), Tosoh Bioscience; Architect (ARC), Abbott Diagnostics; Advia Centaur XP (CEN) and Immulite 2000 XPi (IMM), Siemens Healthineers; Cobas 6000 (COB), Roche Diagnostics; Kryptor (KRY), Thermo Fisher Scientific BRAHMS, Liaison XL (LIA), Diasorin; Lumipulse G (LUM), Fujirebio; Maglumi 2000 Plus (MAG), Snibe and Phadia 250 (PHA), Phadia AB, Thermo Fisher Scientific. All assays were performed according to manufacturers' instructions in six different laboratories in Friuli-Venezia Giulia and Veneto regions of Italy [Lab 1 (AIA), Lab 2 (CL2), Lab 3 (ARC, COB and LUM), Lab 4 (CEN, IMM, KRY and MAG), Lab 5 (LIA) and Lab 6 (PHA)]. Since TgAb values were not normally distributed, the experimental URL (e-URL) was established at 97.5 percentile according to the non-parametric method. TgAb e-URLs showed a significant inter-method variability. Considering the same method, e-URL was much lower than that suggested by manufacturers (m-URL), except for ARC and MAG. Correlation and linear regression were unsatisfactory. Consequently, the agreement between methods was poor, with significant bias in Bland-Altman plot. Despite the efforts for harmonization, TgAb methods cannot be used interchangeably. Therefore, additional effort is required to improve analytical performance taking into consideration approved protocols and guidelines. Moreover, TgAb URL should be used with caution in the management of differentiated thyroid carcinoma patients since the presence and/or the degree of TgAb interference in Tg measurement has not yet been well defined.
Genic insights from integrated human proteomics in GeneCards.
Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron
2016-01-01
GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.
NASA Technical Reports Server (NTRS)
1999-01-01
Langley's mission is accomplished by performing innovative research relevant to national needs and Agency goals, transferring technology to users in a timely manner, and providing development support to other United States Government Agencies, industry, other NASA Centers, the educational community, and the local community. This report contains highlights of some of the major accomplishments and applications that have been made by Langley researchers and by our university and industry colleagues during the past year. The highlights illustrate the broad range of research and technology activities carried out by NASA Langley Research Center and the contributions of this work toward maintaining United States' leadership in aeronautics and space research. A color electronic version of this report is available at URL http://larcpubs.larc.nasa.gov/randt/1998/.
2002-07-01
our general model include: (1) service user (SU), (2) service manager (SM), and (3) service cache manager ( SCM ), where the SCM is an optional...maintained by SMs that satisfy specific requirements. Where employed, the SCM operates as an intermediary, matching advertised SDs of SMs to...Directory Service Agent (optional) not applicableLookup ServiceService Cache Manager ( SCM ) Service URL Service Type Service Attributes Template URL
What We've Learned From Doing Usability Testing on OpenURL Resolvers and Federated Search Engines
ERIC Educational Resources Information Center
Cervone, Frank
2005-01-01
OpenURL resolvers and federated search engines are important new services in the library field. For some librarians, these services may seem "old hat" by now, but for the majority these services are still in the early stages of implementation or planning. In many cases, these two services are offered as a seamlessly integrated whole.…
ERIC Educational Resources Information Center
Nagaraja, Aragudige; Joseph, Shine A.; Polen, Hyla H.; Clauson, Kevin A.
2011-01-01
Purpose: The aim of this paper is to assess and catalogue the magnitude of URL attrition in a high-impact, open access (OA) general medical journal. Design/methodology/approach: All "Public Library of Science Medicine (PLoS Medicine)" articles for 2005-2007 were evaluated and the following items were assessed: number of entries per issue; type of…
HypoxiaDB: a database of hypoxia-regulated proteins
Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala
2013-01-01
There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989
Methods for Coding Tobacco-Related Twitter Data: A Systematic Review
Unger, Jennifer B; Cruz, Tess Boley; Chu, Kar-Hai
2017-01-01
Background As Twitter has grown in popularity to 313 million monthly active users, researchers have increasingly been using it as a data source for tobacco-related research. Objective The objective of this systematic review was to assess the methodological approaches of categorically coded tobacco Twitter data and make recommendations for future studies. Methods Data sources included PsycINFO, Web of Science, PubMed, ABI/INFORM, Communication Source, and Tobacco Regulatory Science. Searches were limited to peer-reviewed journals and conference proceedings in English from January 2006 to July 2016. The initial search identified 274 articles using a Twitter keyword and a tobacco keyword. One coder reviewed all abstracts and identified 27 articles that met the following inclusion criteria: (1) original research, (2) focused on tobacco or a tobacco product, (3) analyzed Twitter data, and (4) coded Twitter data categorically. One coder extracted data collection and coding methods. Results E-cigarettes were the most common type of Twitter data analyzed, followed by specific tobacco campaigns. The most prevalent data sources were Gnip and Twitter’s Streaming application programming interface (API). The primary methods of coding were hand-coding and machine learning. The studies predominantly coded for relevance, sentiment, theme, user or account, and location of user. Conclusions Standards for data collection and coding should be developed to be able to more easily compare and replicate tobacco-related Twitter results. Additional recommendations include the following: sample Twitter’s databases multiple times, make a distinction between message attitude and emotional tone for sentiment, code images and URLs, and analyze user profiles. Being relatively novel and widely used among adolescents and black and Hispanic individuals, Twitter could provide a rich source of tobacco surveillance data among vulnerable populations. PMID:28363883
Manoni, Fabio; Gessoni, Gianluca; Alessio, Maria Grazia; Caleffi, Alberta; Saccani, Graziella; Epifani, Maria Grazia; Tinello, Agostino; Zorzan, Tatiana; Valverde, Sara; Caputo, Marco; Lippi, Giuseppe
2014-01-01
We performed a multicenter study to calculate the upper reference limits (URL) for urine particle quantification in mid-stream samples by using automated urine analyzers. Two laboratories tested 283 subjects using a Sysmex UF-100, two other laboratories tested 313 subjects using Sysmex UF-1000i, whereas two other laboratories tested 267 subjects using Iris IQ®200. The URLs of UF-100 in females and males were 7.8/μL and 6.7/μL for epithelial cells (EC), 11.1/μL and 9.9/μL for red blood cells (RBC), 10.2/μL and 9.7/μL for white blood cells (WBC), and 0.85/μL and 0.87/μL for cylinders (CAST). The URLs of UF-1000i in females and males were 7.6/μL and 7.1/μL for EC, 12.2/μL and 11.1/μL for RBC, 11.9/μL and 11.7/μL for WBC, and 0.88/μL and 0.86/μL for CAST. The URLs of Iris IQ®200 in females and males were 7.8/μL and 6.6/μL for EC, 12.4/μL and 10.1/μL for RBC, 10.9/μL and 9.9/μL for WBC, and 1.1/μL and 1.0/μL for CAST. The URLs obtained in this study were comparable to the lowest values previously reported in the literature. Moreover, no gender-related difference was observed, and analyzer-specific upper reference limits were very similar. © 2013.
Polar Domain Discovery with Sparkler
NASA Astrophysics Data System (ADS)
Duerr, R.; Khalsa, S. J. S.; Mattmann, C. A.; Ottilingam, N. K.; Singh, K.; Lopez, L. A.
2017-12-01
The scientific web is vast and ever growing. It encompasses millions of textual, scientific and multimedia documents describing research in a multitude of scientific streams. Most of these documents are hidden behind forms which require user action to retrieve and thus can't be directly accessed by content crawlers. These documents are hosted on web servers across the world, most often on outdated hardware and network infrastructure. Hence it is difficult and time-consuming to aggregate documents from the scientific web, especially those relevant to a specific domain. Thus generating meaningful domain-specific insights is currently difficult. We present an automated discovery system (Figure 1) using Sparkler, an open-source, extensible, horizontally scalable crawler which facilitates high throughput and focused crawling of documents pertinent to a particular domain such as information about polar regions. With this set of highly domain relevant documents, we show that it is possible to answer analytical questions about that domain. Our domain discovery algorithm leverages prior domain knowledge to reach out to commercial/scientific search engines to generate seed URLs. Subject matter experts then annotate these seed URLs manually on a scale from highly relevant to irrelevant. We leverage this annotated dataset to train a machine learning model which predicts the `domain relevance' of a given document. We extend Sparkler with this model to focus crawling on documents relevant to that domain. Sparkler avoids disruption of service by 1) partitioning URLs by hostname such that every node gets a different host to crawl and by 2) inserting delays between subsequent requests. With an NSF-funded supercomputer Wrangler, we scaled our domain discovery pipeline to crawl about 200k polar specific documents from the scientific web, within a day.
Soldiers for Peace: Critical Operational Issues.
1996-01-01
or policies of its research sponsors. Published 1996 by RAND 1700 Main Street, RO. Box 2138, Santa Monica, CA 90407-2138 RAND URL: http...force in Cyprus, a corps of 30,000 men equipped with 265 A5 M-48 main battle tanks, over 100 armored personnel carriers, and nearly 200 pieces of...AMX-30B main battle tanks from 52 to 104 and took delivery of 18 BMP-3 infantry fighting vehicles. In addition, the National Guard improved its
TRACTS: a program to map oligopurine.oligopyrimidine and other binary DNA tracts
Gal, Moshe; Katz, Tzvi; Ovadia, Amir; Yagil, Gad
2003-01-01
A program to map the locations and frequencies of DNA tracts composed of only two bases (‘Binary DNA’) is described. The program, TRACTS (URL http://bioportal.weizmann.ac.il/tracts/tracts.html and/or http://bip.weizmann.ac.il/miwbin/servers/tracts) is of interest because long tracts composed of only two bases are highly over-represented in most genomes. In eukaryotes, oligopurine.oligopyrimidine tracts (‘R.Y tracts’) are found in the highest excess. In prokaryotes, W tracts predominate (A,T ‘rich’). A pre-program, ANEX, parses database annotation files of GenBank and EMBL, to produce a convenient one-line list of every gene (exon, intron) in a genome. The main unit lists and analyzes tracts of the three possible binary pairs (R.Y, K.M and S;W). As an example, the results of R.Y tract mapping of mammalian gene p53 is described. PMID:12824393
Comeau, Donald C; Liu, Haibin; Islamaj Doğan, Rezarta; Wilbur, W John
2014-01-01
BioC is a new format and associated code libraries for sharing text and annotations. We have implemented BioC natural language preprocessing pipelines in two popular programming languages: C++ and Java. The current implementations interface with the well-known MedPost and Stanford natural language processing tool sets. The pipeline functionality includes sentence segmentation, tokenization, part-of-speech tagging, lemmatization and sentence parsing. These pipelines can be easily integrated along with other BioC programs into any BioC compliant text mining systems. As an application, we converted the NCBI disease corpus to BioC format, and the pipelines have successfully run on this corpus to demonstrate their functionality. Code and data can be downloaded from http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net. © The Author(s) 2014. Published by Oxford University Press.
Space-Time Dynamics of Soil Moisture and Temperature: Scale issues
NASA Technical Reports Server (NTRS)
Mohanty, Binayak P.; Miller, Douglas A.; Th.vanGenuchten, M.
2003-01-01
The goal of this project is to gain further understanding of soil moisture/temperature dynamics at different spatio-temporal scales and physical controls/parameters.We created a comprehensive GIS database, which has been accessed extensively by NASA Land Surface Hydrology investigators (and others), is located at the following URL: http://www.essc.psu.edu/nasalsh. For soil moisture field experiments such as SGP97, SGP99, SMEX02, and SMEX03, cartographic products were designed for multiple applications, both pre- and post-mission. Premission applications included flight line planning and field operations logistics, as well as general insight into the extent and distribution of soil, vegetation, and topographic properties for the study areas. The cartographic products were created from original spatial information resources that were imported into Adobe Illustrator, where the maps were created and PDF versions were made for distribution and download.
AbDb: antibody structure database—a database of PDB-derived antibody structures
Ferdous, Saba
2018-01-01
Abstract In order to analyse structures of proteins of a particular class, these need to be extracted from Protein Data Bank (PDB) files. In the case of antibodies, there are a number of special considerations: (i) identifying antibodies in the PDB is not trivial, (ii) they may be crystallized with or without antigen, (iii) for analysis purposes, one is normally only interested in the Fv region of the antibody, (iv) structural analysis of epitopes, in particular, requires individual antibody–antigen complexes from a PDB file which may contain multiple copies of the same, or different, antibodies and (v) standard numbering schemes should be applied. Consequently, there is a need for a specialist resource containing pre-numbered non-redundant antibody Fv structures with their cognate antigens. We have created an automatically updated resource, AbDb, which collects the Fv regions from antibody structures using information from our SACS database which summarizes antibody structures from the PDB. PDB files containing multiple structures are split and numbered and each antibody structure is associated with its antigen where available. Antibody structures with only light or heavy chains have also been processed and sequences of antibodies are compared to identify multiple structures of the same antibody. The data may be queried on the basis of PDB code, or the name or species of the antibody or antigen, and the complete datasets may be downloaded. Database URL: www.bioinf.org.uk/abs/abdb/ PMID:29718130
NASA Astrophysics Data System (ADS)
Stroker, K. J.; Jencks, J. H.; Eakins, B.
2016-12-01
The Index to Marine and Lacustrine Geological Samples (IMLGS) is a community designed and maintained resource enabling researchers to locate and request seafloor and lakebed geologic samples curated by partner institutions. The Index was conceived in the dawn of the digital age by representatives from U.S. academic and government marine core repositories and the NOAA National Geophysical Data Center, now the National Centers for Environmental Information (NCEI), at a 1977 meeting convened by the National Science Foundation (NSF). The Index is based on core concepts of community oversight, common vocabularies, consistent metadata and a shared interface. The Curators Consortium, international in scope, meets biennially to share ideas and discuss best practices. NCEI serves the group by providing database access and maintenance, a list server, digitizing support and long-term archival of sample metadata, data and imagery. Over three decades, participating curators have performed the laborious task of creating and contributing metadata for over 205,000 sea floor and lake-bed cores, grabs, and dredges archived in their collections. Some partners use the Index for primary web access to their collections while others use it to increase exposure of more in-depth institutional systems. The IMLGS has a persistent URL/Digital Object Identifier (DOI), as well as DOIs assigned to partner collections for citation and to provide a persistent link to curator collections. The Index is currently a geospatially-enabled relational database, publicly accessible via Web Feature and Web Map Services, and text- and ArcGIS map-based web interfaces. To provide as much knowledge as possible about each sample, the Index includes curatorial contact information and links to related data, information and images : 1) at participating institutions, 2) in the NCEI archive, and 3) through a Linked Data interface maintained by the Rolling Deck to Repository R2R. Over 43,000 International GeoSample Numbers (IGSNs) linking to the System for Earth Sample Registration (SESAR) are included in anticipation of opportunities for interconnectivity with Integrated Earth Data Applications (IEDA) systems. The paper will discuss the database with a goal to increase the connections and links to related data at partner institutions.
NASA Technical Reports Server (NTRS)
Panait, Claudia M.
2004-01-01
The NASA Glenn Library is a science and engineering research library providing the most current books, journals, CD-ROM's and documents to support the study of aeronautics, space propulsion and power, communications technology, materials and structures and microgravity science. The GRC technical library also supports the research and development efforts of all scientists and engineers on site via full text electronic files, literature searching, technical reports, etc. As an intern in the NASA Glenn Library, I attempt to support these objectives through efficiently and effectively fulfilling the assignment that was given to me. The assignment that was relegated to me was to catalog National Advisory Committee for Aeronautics, NASA Technical Documents into NASA Galaxie. This process consists of holdings being added to existing Galaxie records, upgrades and editing done to the bibliographic records when needed, adding URL's into Galaxie when they were missing from the record. NASA ASAP and Digidoc was used to locate URL's of PDF's that were not in Galaxie. A spreadsheet of documents with no URL's were maintained. Also, a subject channel of web, fill-text, paid and free, journal and other subject specific pages were developed and expanded fiom current content of intranet pages. To expand upon the second half of my assignment, I was given the project of taking inventory of the library s book collection. I kept record of the books that were not accounted for on a master list I was given to work fiom and submitted them for correction and addition. I also made sure the books were placed in the appropriate order and made corrections to any discrepancies that existed between the master list and what was on the shelf. Upon completion of this assignment, I will have verified that 21,113 books were in the correct location, order and have the correct corresponding serial number and barcode. In conclusion, as of this date I have input around 750 documents into NASA Galaxie, inputting about half of the NASA Technical Documents into the system. The rest of my tenure in this program will consist of finishing the other half of the reports. In regard to the second assignment, I still have about three-quarters of the collection to record and correct.
An, Vadim A.; Ovtchinnikov, Vladimir M.; Kaazik, Pyotr B.; ...
2015-03-27
Seismologists from Kazakhstan, Russia, and the United States have rescued the Soviet-era archive of nuclear explosion seismograms recorded at Borovoye in northern Kazakhstan during the period 1966–1996. The signals had been stored on about 8000 magnetic tapes, which were held at the recording observatory. After hundreds of man-years of work, these digital waveforms together with significant metadata are now available via the project URL, namely http://www.ldeo.columbia.edu/res/pi/Monitoring/Data/ as a modern open database, of use to diverse communities. Three different sets of recording systems were operated at Borovoye, each using several different seismometers and different gain levels. For some explosions, more thanmore » twenty different channels of data are available. A first data release, in 2001, contained numerous glitches and lacked many instrument responses, but could still be used for measuring accurate arrival times and for comparison of the strengths of different types of seismic waves. The project URL also links to our second major data release, for nuclear explosions in Eurasia recorded in Borovoye, in which the data have been deglitched, all instrument responses have been included, and recording systems are described in detail. This second dataset consists of more than 3700 waveforms (digital seismograms) from almost 500 nuclear explosions in Eurasia, many of them recorded at regional distances. It is important as a training set for the development and evaluation of seismological methods of discriminating between earthquakes and underground explosions, and can be used for assessment of three-dimensional models of the Earth’s interior structure.« less
Katsuki, Takeo; Mackey, Tim Ken; Cuomo, Raphael
2015-12-16
Youth and adolescent non-medical use of prescription medications (NUPM) has become a national epidemic. However, little is known about the association between promotion of NUPM behavior and access via the popular social media microblogging site, Twitter, which is currently used by a third of all teens. In order to better assess NUPM behavior online, this study conducts surveillance and analysis of Twitter data to characterize the frequency of NUPM-related tweets and also identifies illegal access to drugs of abuse via online pharmacies. Tweets were collected over a 2-week period from April 1-14, 2015, by applying NUPM keyword filters for both generic/chemical and street names associated with drugs of abuse using the Twitter public streaming application programming interface. Tweets were then analyzed for relevance to NUPM and whether they promoted illegal online access to prescription drugs using a protocol of content coding and supervised machine learning. A total of 2,417,662 tweets were collected and analyzed for this study. Tweets filtered for generic drugs names comprised 232,108 tweets, including 22,174 unique associated uniform resource locators (URLs), and 2,185,554 tweets (376,304 unique URLs) filtered for street names. Applying an iterative process of manual content coding and supervised machine learning, 81.72% of the generic and 12.28% of the street NUPM datasets were predicted as having content relevant to NUPM respectively. By examining hyperlinks associated with NUPM relevant content for the generic Twitter dataset, we discovered that 75.72% of the tweets with URLs included a hyperlink to an online marketing affiliate that directly linked to an illicit online pharmacy advertising the sale of Valium without a prescription. This study examined the association between Twitter content, NUPM behavior promotion, and online access to drugs using a broad set of prescription drug keywords. Initial results are concerning, as our study found over 45,000 tweets that directly promoted NUPM by providing a URL that actively marketed the illegal online sale of prescription drugs of abuse. Additional research is needed to further establish the link between Twitter content and NUPM, as well as to help inform future technology-based tools, online health promotion activities, and public policy to combat NUPM online.
Cuomo, Raphael
2015-01-01
Background Youth and adolescent non-medical use of prescription medications (NUPM) has become a national epidemic. However, little is known about the association between promotion of NUPM behavior and access via the popular social media microblogging site, Twitter, which is currently used by a third of all teens. Objective In order to better assess NUPM behavior online, this study conducts surveillance and analysis of Twitter data to characterize the frequency of NUPM-related tweets and also identifies illegal access to drugs of abuse via online pharmacies. Methods Tweets were collected over a 2-week period from April 1-14, 2015, by applying NUPM keyword filters for both generic/chemical and street names associated with drugs of abuse using the Twitter public streaming application programming interface. Tweets were then analyzed for relevance to NUPM and whether they promoted illegal online access to prescription drugs using a protocol of content coding and supervised machine learning. Results A total of 2,417,662 tweets were collected and analyzed for this study. Tweets filtered for generic drugs names comprised 232,108 tweets, including 22,174 unique associated uniform resource locators (URLs), and 2,185,554 tweets (376,304 unique URLs) filtered for street names. Applying an iterative process of manual content coding and supervised machine learning, 81.72% of the generic and 12.28% of the street NUPM datasets were predicted as having content relevant to NUPM respectively. By examining hyperlinks associated with NUPM relevant content for the generic Twitter dataset, we discovered that 75.72% of the tweets with URLs included a hyperlink to an online marketing affiliate that directly linked to an illicit online pharmacy advertising the sale of Valium without a prescription. Conclusions This study examined the association between Twitter content, NUPM behavior promotion, and online access to drugs using a broad set of prescription drug keywords. Initial results are concerning, as our study found over 45,000 tweets that directly promoted NUPM by providing a URL that actively marketed the illegal online sale of prescription drugs of abuse. Additional research is needed to further establish the link between Twitter content and NUPM, as well as to help inform future technology-based tools, online health promotion activities, and public policy to combat NUPM online. PMID:26677966
Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.
2010-01-01
The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719
Processing biological literature with customizable Web services supporting interoperable formats.
Rak, Rafal; Batista-Navarro, Riza Theresa; Carter, Jacob; Rowley, Andrew; Ananiadou, Sophia
2014-01-01
Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk. © The Author(s) 2014. Published by Oxford University Press.
Processing biological literature with customizable Web services supporting interoperable formats
Rak, Rafal; Batista-Navarro, Riza Theresa; Carter, Jacob; Rowley, Andrew; Ananiadou, Sophia
2014-01-01
Web services have become a popular means of interconnecting solutions for processing a body of scientific literature. This has fuelled research on high-level data exchange formats suitable for a given domain and ensuring the interoperability of Web services. In this article, we focus on the biological domain and consider four interoperability formats, BioC, BioNLP, XMI and RDF, that represent domain-specific and generic representations and include well-established as well as emerging specifications. We use the formats in the context of customizable Web services created in our Web-based, text-mining workbench Argo that features an ever-growing library of elementary analytics and capabilities to build and deploy Web services straight from a convenient graphical user interface. We demonstrate a 2-fold customization of Web services: by building task-specific processing pipelines from a repository of available analytics, and by configuring services to accept and produce a combination of input and output data interchange formats. We provide qualitative evaluation of the formats as well as quantitative evaluation of automatic analytics. The latter was carried out as part of our participation in the fourth edition of the BioCreative challenge. Our analytics built into Web services for recognizing biochemical concepts in BioC collections achieved the highest combined scores out of 10 participating teams. Database URL: http://argo.nactem.ac.uk. PMID:25006225
An Analysis of SE and MBSE Concepts to Support Defence Capability Acquisition
2014-09-01
Government Department of Finance and Deregulation, Canberra, ACT, August 2011. [online] URL: http://agimo.gov.au/files/2012/04/AGA_RM_v3_0.pdf ANSI...First Time, White Paper, Aberdeen Group Group, August 2011. [online] URL: http://www.aberdeen.com/Aberdeen- Library/7121/RA-system-design...Edge e-zine, IBM Software Group, August 2003. Cantor 2003b Cantor, Murray, Rational Unified Process for Systems Engineering Part I1: System
Curating Virtual Data Collections
NASA Technical Reports Server (NTRS)
Lynnes, Chris; Leon, Amanda; Ramapriyan, Hampapuram; Tsontos, Vardis; Shie, Chung-Lin; Liu, Zhong
2015-01-01
NASAs Earth Observing System Data and Information System (EOSDIS) contains a rich set of datasets and related services throughout its many elements. As a result, locating all the EOSDIS data and related resources relevant to particular science theme can be daunting. This is largely because EOSDIS data's organizing principle is affected more by the way they are produced than around the expected end use. Virtual collections oriented around science themes can overcome this by presenting collections of data and related resources that are organized around the user's interest, not around the way the data were produced. Virtual collections consist of annotated web addresses (URLs) that point to data and related resource addresses, thus avoiding the need to copy all of the relevant data to a single place. These URL addresses can be consumed by a variety of clients, ranging from basic URL downloaders (wget, curl) and web browsers to sophisticated data analysis programs such as the Integrated Data Viewer.
A Bookmarking Service for Organizing and Sharing URLs
NASA Technical Reports Server (NTRS)
Keller, Richard M.; Wolfe, Shawn R.; Chen, James R.; Mathe, Nathalie; Rabinowitz, Joshua L.
1997-01-01
Web browser bookmarking facilities predominate as the method of choice for managing URLs. In this paper, we describe some deficiencies of current bookmarking schemes, and examine an alternative to current approaches. We present WebTagger(TM), an implemented prototype of a personal bookmarking service that provides both individuals and groups with a customizable means of organizing and accessing Web-based information resources. In addition, the service enables users to supply feedback on the utility of these resources relative to their information needs, and provides dynamically-updated ranking of resources based on incremental user feedback. Individuals may access the service from anywhere on the Internet, and require no special software. This service greatly simplifies the process of sharing URLs within groups, in comparison with manual methods involving email. The underlying bookmark organization scheme is more natural and flexible than current hierarchical schemes supported by the major Web browsers, and enables rapid access to stored bookmarks.
2011-03-01
9 Figure 7. RDS preferences widget after loading an unusual font (left) and RDS SPARQL query widget (right...Entered By Individual: SGT Juan Gonzalez DOI: 2007-01-06 13:00:00 Date Entered: 2007-01-06 23:32:03 Subject: Al-Qaeda Reading Material Source...preferences widget after loading an unusual font (left) and RDS SPARQL query widget (right). NetKernel and RDS-specific modules are specified with a URL
A Survey of Doctoral Programs in Chemical Education in the United States
NASA Astrophysics Data System (ADS)
Mason, Diana
2001-02-01
Employment opportunities are expanding in chemical education and chemical education research. Consequently, more students are seeking to further their education in chemistry by obtaining tertiary degrees in chemical education. At the Fall 2000 ACS Meeting in Washington, DC, DivCHED sponsored a symposium highlighting several doctoral programs in chemical education in the U.S. Included in this summary is the following information regarding each program: name of university, faculty contact(s), corresponding email addresses and URLs, and a brief description of the program.
NASA Astrophysics Data System (ADS)
Aleman, A.; Olsen, L. M.; Ritz, S.; Stevens, T.; Morahan, M.; Grebas, S. K.
2011-12-01
NASA's Global Change Master Directory provides the scientific community with the ability to discover, access, and use Earth science data, data-related services, and climate diagnostics worldwide.The GCMD offers descriptions of Earth science data sets using the Directory Interchange Format (DIF) metadata standard; Earth science related data services are described using the Service Entry Resource Format (SERF); and climate visualizations are described using the Climate Diagnostic (CD) standard. The DIF, SERF and CD standards each capture data attributes used to determine whether a data set, service, or climate visualization is relevant to a user's needs.Metadata fields include: title, summary, science keywords, service keywords, data center, data set citation, personnel, instrument, platform, quality, related URL, temporal and spatial coverage, data resolution and distribution information.In addition, nine valuable sets of controlled vocabularies have been developed to assist users in normalizing the search for data descriptions. An update to the GCMD's search functionality is planned to further capitalize on the controlled vocabularies during database queries.By implementing a dynamic keyword "tree", users will have the ability to search for data sets by combining keywords in new ways.This will allow users to conduct more relevant and efficient database searches to support the free exchange and re-use of Earth science data.
MiDAS: the field guide to the microbes of activated sludge
McIlroy, Simon Jon; Saunders, Aaron Marc; Albertsen, Mads; Nierychlo, Marta; McIlroy, Bianca; Hansen, Aviaja Anna; Karst, Søren Michael; Nielsen, Jeppe Lund; Nielsen, Per Halkjær
2015-01-01
The Microbial Database for Activated Sludge (MiDAS) field guide is a freely available online resource linking the identity of abundant and process critical microorganisms in activated sludge wastewater treatment systems to available data related to their functional importance. Phenotypic properties of some of these genera are described, but most are known only from sequence data. The MiDAS taxonomy is a manual curation of the SILVA taxonomy that proposes a name for all genus-level taxa observed to be abundant by large-scale 16 S rRNA gene amplicon sequencing of full-scale activated sludge communities. The taxonomy can be used to classify unknown sequences, and the online MiDAS field guide links the identity to the available information about their morphology, diversity, physiology and distribution. The use of a common taxonomy across the field will provide a solid foundation for the study of microbial ecology of the activated sludge process and related treatment processes. The online MiDAS field guide is a collaborative workspace intended to facilitate a better understanding of the ecology of activated sludge and related treatment processes—knowledge that will be an invaluable resource for the optimal design and operation of these systems. Database URL: http://www.midasfieldguide.org PMID:26120139
GermOnline 4.0 is a genomics gateway for germline development, meiosis and the mitotic cell cycle.
Lardenois, Aurélie; Gattiker, Alexandre; Collin, Olivier; Chalmel, Frédéric; Primig, Michael
2010-01-01
GermOnline 4.0 is a cross-species database portal focusing on high-throughput expression data relevant for germline development, the meiotic cell cycle and mitosis in healthy versus malignant cells. It is thus a source of information for life scientists as well as clinicians who are interested in gene expression and regulatory networks. The GermOnline gateway provides unlimited access to information produced with high-density oligonucleotide microarrays (3'-UTR GeneChips), genome-wide protein-DNA binding assays and protein-protein interaction studies in the context of Ensembl genome annotation. Samples used to produce high-throughput expression data and to carry out genome-wide in vivo DNA binding assays are annotated via the MIAME-compliant Multiomics Information Management and Annotation System (MIMAS 3.0). Furthermore, the Saccharomyces Genomics Viewer (SGV) was developed and integrated into the gateway. SGV is a visualization tool that outputs genome annotation and DNA-strand specific expression data produced with high-density oligonucleotide tiling microarrays (Sc_tlg GeneChips) which cover the complete budding yeast genome on both DNA strands. It facilitates the interpretation of expression levels and transcript structures determined for various cell types cultured under different growth and differentiation conditions. Database URL: www.germonline.org/