protein atlas database: Topics by Science.gov

Sample records for protein atlas database

Atlas - a data warehouse for integrative bioinformatics.

PubMed

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

2005-02-21

We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/
Atlas – a data warehouse for integrative bioinformatics

PubMed Central

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

2005-01-01

Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
AtlasT4SS: a curated database for type IV secretion systems.

PubMed

Souza, Rangel C; del Rosario Quispe Saji, Guadalupe; Costa, Maiana O C; Netto, Diogo S; Lima, Nicholas C B; Klein, Cecília C; Vasconcelos, Ana Tereza R; Nicolás, Marisa F

2012-08-09

The type IV secretion system (T4SS) can be classified as a large family of macromolecule transporter systems, divided into three recognized sub-families, according to the well-known functions. The major sub-family is the conjugation system, which allows transfer of genetic material, such as a nucleoprotein, via cell contact among bacteria. Also, the conjugation system can transfer genetic material from bacteria to eukaryotic cells; such is the case with the T-DNA transfer of Agrobacterium tumefaciens to host plant cells. The system of effector protein transport constitutes the second sub-family, and the third one corresponds to the DNA uptake/release system. Genome analyses have revealed numerous T4SS in Bacteria and Archaea. The purpose of this work was to organize, classify, and integrate the T4SS data into a single database, called AtlasT4SS - the first public database devoted exclusively to this prokaryotic secretion system. The AtlasT4SS is a manual curated database that describes a large number of proteins related to the type IV secretion system reported so far in Gram-negative and Gram-positive bacteria, as well as in Archaea. The database was created using the RDBMS MySQL and the Catalyst Framework based in the Perl programming language and using the Model-View-Controller (MVC) design pattern for Web. The current version holds a comprehensive collection of 1,617 T4SS proteins from 58 Bacteria (49 Gram-negative and 9 Gram-Positive), one Archaea and 11 plasmids. By applying the bi-directional best hit (BBH) relationship in pairwise genome comparison, it was possible to obtain a core set of 134 clusters of orthologous genes encoding T4SS proteins. In our database we present one way of classifying orthologous groups of T4SSs in a hierarchical classification scheme with three levels. The first level comprises four classes that are based on the organization of genetic determinants, shared homologies, and evolutionary relationships: (i) F-T4SS, (ii) P-T4SS, (iii) I-T4SS, and (iv) GI-T4SS. The second level designates a specific well-known protein families otherwise an uncharacterized protein family. Finally, in the third level, each protein of an ortholog cluster is classified according to its involvement in a specific cellular process. AtlasT4SS database is open access and is available at http://www.t4ss.lncc.br.
LiverAtlas: a unique integrated knowledge database for systems-level research of liver and hepatic disease.

PubMed

Zhang, Yanqiong; Yang, Chunyuan; Wang, Shaochuang; Chen, Tao; Li, Mansheng; Wang, Xue; Li, Dongsheng; Wang, Kang; Ma, Jie; Wu, Songfeng; Zhang, Xueli; Zhu, Yunping; Wu, Jinsheng; He, Fuchu

2013-09-01

A large amount of liver-related physiological and pathological data exist in publicly available biological and bibliographic databases, which are usually far from comprehensive or integrated. Data collection, integration and mining processes pose a great challenge to scientific researchers and clinicians interested in the liver. To address these problems, we constructed LiverAtlas (http://liveratlas.hupo.org.cn), a comprehensive resource of biomedical knowledge related to the liver and various hepatic diseases by incorporating 53 databases. In the present version, LiverAtlas covers data on liver-related genomics, transcriptomics, proteomics, metabolomics and hepatic diseases. Additionally, LiverAtlas provides a wealth of manually curated information, relevant literature citations and cross-references to other databases. Importantly, an expert-confirmed Human Liver Disease Ontology, including relevant information for 227 types of hepatic disease, has been constructed and is used to annotate LiverAtlas data. Furthermore, we have demonstrated two examples of applying LiverAtlas data to identify candidate markers for hepatocellular carcinoma (HCC) at the systems level and to develop a systems biology-based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC differential diagnosis. LiverAtlas is the most comprehensive liver and hepatic disease resource, which helps biologists and clinicians to analyse their data at the systems level and will contribute much to the biomarker discovery and diagnostic performance enhancement for liver diseases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard A.; Brown, Joseph M.; Colby, Sean M.

ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies.

PubMed

Hadadi, Noushin; Hafner, Jasmin; Shajkofci, Adrian; Zisaki, Aikaterini; Hatzimanikatis, Vassily

2016-10-21

Because the complexity of metabolism cannot be intuitively understood or analyzed, computational methods are indispensable for studying biochemistry and deepening our understanding of cellular metabolism to promote new discoveries. We used the computational framework BNICE.ch along with cheminformatic tools to assemble the whole theoretical reactome from the known metabolome through expansion of the known biochemistry presented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We constructed the ATLAS of Biochemistry, a database of all theoretical biochemical reactions based on known biochemical principles and compounds. ATLAS includes more than 130 000 hypothetical enzymatic reactions that connect two or more KEGG metabolites through novel enzymatic reactions that have never been reported to occur in living organisms. Moreover, ATLAS reactions integrate 42% of KEGG metabolites that are not currently present in any KEGG reaction into one or more novel enzymatic reactions. The generated repository of information is organized in a Web-based database ( http://lcsb-databases.epfl.ch/atlas/ ) that allows the user to search for all possible routes from any substrate compound to any product. The resulting pathways involve known and novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential targets for protein engineering. Our approach of introducing novel biochemistry into pathway design and associated databases will be important for synthetic biology and metabolic engineering.
ExAtlas: An interactive online tool for meta-analysis of gene expression data.

PubMed

Sharov, Alexei A; Schlessinger, David; Ko, Minoru S H

2015-12-01

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users' own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher's methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein-protein interaction) are pre-loaded and can be used for functional annotations.
Development, deployment and operations of ATLAS databases

NASA Astrophysics Data System (ADS)

Vaniachine, A. V.; Schmitt, J. G. v. d.

2008-07-01

In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.
A human protein atlas for normal and cancer tissues based on antibody proteomics.

PubMed

Uhlén, Mathias; Björling, Erik; Agaton, Charlotta; Szigyarto, Cristina Al-Khalili; Amini, Bahram; Andersen, Elisabet; Andersson, Ann-Catrin; Angelidou, Pia; Asplund, Anna; Asplund, Caroline; Berglund, Lisa; Bergström, Kristina; Brumer, Harry; Cerjan, Dijana; Ekström, Marica; Elobeid, Adila; Eriksson, Cecilia; Fagerberg, Linn; Falk, Ronny; Fall, Jenny; Forsberg, Mattias; Björklund, Marcus Gry; Gumbel, Kristoffer; Halimi, Asif; Hallin, Inga; Hamsten, Carl; Hansson, Marianne; Hedhammar, My; Hercules, Görel; Kampf, Caroline; Larsson, Karin; Lindskog, Mats; Lodewyckx, Wald; Lund, Jan; Lundeberg, Joakim; Magnusson, Kristina; Malm, Erik; Nilsson, Peter; Odling, Jenny; Oksvold, Per; Olsson, Ingmarie; Oster, Emma; Ottosson, Jenny; Paavilainen, Linda; Persson, Anja; Rimini, Rebecca; Rockberg, Johan; Runeson, Marcus; Sivertsson, Asa; Sköllermo, Anna; Steen, Johanna; Stenvall, Maria; Sterky, Fredrik; Strömberg, Sara; Sundberg, Mårten; Tegel, Hanna; Tourle, Samuel; Wahlund, Eva; Waldén, Annelie; Wan, Jinghong; Wernérus, Henrik; Westberg, Joakim; Wester, Kenneth; Wrethagen, Ulla; Xu, Lan Lan; Hober, Sophia; Pontén, Fredrik

2005-12-01

Antibody-based proteomics provides a powerful approach for the functional study of the human proteome involving the systematic generation of protein-specific affinity reagents. We used this strategy to construct a comprehensive, antibody-based protein atlas for expression and localization profiles in 48 normal human tissues and 20 different cancers. Here we report a new publicly available database containing, in the first version, approximately 400,000 high resolution images corresponding to more than 700 antibodies toward human proteins. Each image has been annotated by a certified pathologist to provide a knowledge base for functional studies and to allow queries about protein profiles in normal and disease tissues. Our results suggest it should be possible to extend this analysis to the majority of all human proteins thus providing a valuable tool for medical and biological research.
Windows on the brain: the emerging role of atlases and databases in neuroscience

NASA Technical Reports Server (NTRS)

Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

2002-01-01

Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.
Exploring the Universe of Protein Structures beyond the Protein Data Bank

PubMed Central

Cossio, Pilar; Trovato, Antonio; Pietrucci, Fabio; Seno, Flavio; Maritan, Amos; Laio, Alessandro

2010-01-01

It is currently believed that the atlas of existing protein structures is faithfully represented in the Protein Data Bank. However, whether this atlas covers the full universe of all possible protein structures is still a highly debated issue. By using a sophisticated numerical approach, we performed an exhaustive exploration of the conformational space of a 60 amino acid polypeptide chain described with an accurate all-atom interaction potential. We generated a database of around 30,000 compact folds with at least of secondary structure corresponding to local minima of the potential energy. This ensemble plausibly represents the universe of protein folds of similar length; indeed, all the known folds are represented in the set with good accuracy. However, we discover that the known folds form a rather small subset, which cannot be reproduced by choosing random structures in the database. Rather, natural and possible folds differ by the contact order, on average significantly smaller in the former. This suggests the presence of an evolutionary bias, possibly related to kinetic accessibility, towards structures with shorter loops between contacting residues. Beside their conceptual relevance, the new structures open a range of practical applications such as the development of accurate structure prediction strategies, the optimization of force fields, and the identification and design of novel folds. PMID:21079678
National Transportation Atlas Databases : 1995

DOT National Transportation Integrated Search

1995-01-01

BTS has compiled the initial version of a geographic atlas : database to support research, analysis, and decision making : across all modes of transportation. The atlas databases are : designed primarily to meet the needs of DOT at the national : lev...
Image database for digital hand atlas

NASA Astrophysics Data System (ADS)

Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente; Dey, Partha S.; Gertych, Arkadiusz; Pospiech-Kurkowska, Sywia

2003-05-01

Bone age assessment is a procedure frequently performed in pediatric patients to evaluate their growth disorder. A commonly used method is atlas matching by a visual comparison of a hand radiograph with a small reference set of old Greulich-Pyle atlas. We have developed a new digital hand atlas with a large set of clinically normal hand images of diverse ethnic groups. In this paper, we will present our system design and implementation of the digital atlas database to support the computer-aided atlas matching for bone age assessment. The system consists of a hand atlas image database, a computer-aided diagnostic (CAD) software module for image processing and atlas matching, and a Web user interface. Users can use a Web browser to push DICOM images, directly or indirectly from PACS, to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, are then extracted and compared with patterns from the atlas image database to assess the bone age. The digital atlas method built on a large image database and current Internet technology provides an alternative to supplement or replace the traditional one for a quantitative, accurate and cost-effective assessment of bone age.
National Transportation Atlas Databases : 1999

DOT National Transportation Integrated Search

1999-01-01

The National Transportation Atlas Databases -- 1999 (NTAD99) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...
National Transportation Atlas Databases : 2001

DOT National Transportation Integrated Search

2001-01-01

The National Transportation Atlas Databases-2001 (NTAD-2001) is a set of national geographic databases of transportation facilities. These databases include geospatial information for transportation modal networks and intermodal terminals and related...
National Transportation Atlas Databases : 1996

DOT National Transportation Integrated Search

1996-01-01

The National Transportation Atlas Databases -- 1996 (NTAD96) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...
National Transportation Atlas Databases : 2000

DOT National Transportation Integrated Search

2000-01-01

The National Transportation Atlas Databases-2000 (NTAD-2000) is a set of national geographic databases of transportation facilities. These databases include geospatial information for transportation modal networks and intermodal terminals and related...
National Transportation Atlas Databases : 1997

DOT National Transportation Integrated Search

1997-01-01

The National Transportation Atlas Databases -- 1997 (NTAD97) is a set of national : geographic databases of transportation facilities. These databases include geospatial : information for transportation modal networks and intermodal terminals, and re...
Advanced technologies for scalable ATLAS conditions database access on the grid

NASA Astrophysics Data System (ADS)

Basset, R.; Canali, L.; Dimitrov, G.; Girone, M.; Hawkings, R.; Nevski, P.; Valassi, A.; Vaniachine, A.; Viegas, F.; Walker, R.; Wong, A.

2010-04-01

During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions Db data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions Db data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested a novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of pilot job submission on the Grid: instead of the actual query, an ATLAS utility library sends to the database server a pilot query first.
Expression Atlas: gene and protein expression across multiple studies and organisms

PubMed Central

Tang, Y Amy; Bazant, Wojciech; Burke, Melissa; Fuentes, Alfonso Muñoz-Pomer; George, Nancy; Koskinen, Satu; Mohammed, Suhaib; Geniza, Matthew; Preece, Justin; Jarnuczak, Andrew F; Huber, Wolfgang; Stegle, Oliver; Brazma, Alvis; Petryszak, Robert

2018-01-01

Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions. PMID:29165655

From sequence to enzyme mechanism using multi-label machine learning.

PubMed

De Ferrari, Luna; Mitchell, John B O

2014-05-19

In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files.
National Transportation Atlas Databases : 2002

DOT National Transportation Integrated Search

2002-01-01

The National Transportation Atlas Databases 2002 (NTAD2002) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2010

DOT National Transportation Integrated Search

2010-01-01

The National Transportation Atlas Databases 2010 (NTAD2010) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2006

DOT National Transportation Integrated Search

2006-01-01

The National Transportation Atlas Databases 2006 (NTAD2006) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2005

DOT National Transportation Integrated Search

2005-01-01

The National Transportation Atlas Databases 2005 (NTAD2005) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2008

DOT National Transportation Integrated Search

2008-01-01

The National Transportation Atlas Databases 2008 (NTAD2008) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2003

DOT National Transportation Integrated Search

2003-01-01

The National Transportation Atlas Databases 2003 (NTAD2003) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2004

DOT National Transportation Integrated Search

2004-01-01

The National Transportation Atlas Databases 2004 (NTAD2004) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2009

DOT National Transportation Integrated Search

2009-01-01

The National Transportation Atlas Databases 2009 (NTAD2009) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2007

DOT National Transportation Integrated Search

2007-01-01

The National Transportation Atlas Databases 2007 (NTAD2007) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2012

DOT National Transportation Integrated Search

2012-01-01

The National Transportation Atlas Databases 2012 (NTAD2012) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
National Transportation Atlas Databases : 2011

DOT National Transportation Integrated Search

2011-01-01

The National Transportation Atlas Databases 2011 (NTAD2011) is a set of nationwide geographic databases of transportation facilities, transportation networks, and associated infrastructure. These datasets include spatial information for transportatio...
Glance Information System for ATLAS Management

NASA Astrophysics Data System (ADS)

Grael, F. F.; Maidantchik, C.; Évora, L. H. R. A.; Karam, K.; Moraes, L. O. F.; Cirilli, M.; Nessi, M.; Pommès, K.; ATLAS Collaboration

2011-12-01

ATLAS Experiment is an international collaboration where more than 37 countries, 172 institutes and laboratories, 2900 physicists, engineers, and computer scientists plus 700 students participate. The management of this teamwork involves several aspects such as institute contribution, employment records, members' appointment, authors' list, preparation and publication of papers and speakers nomination. Previously, most of the information was accessible by a limited group and developers had to face problems such as different terminology, diverse data modeling, heterogeneous databases and unlike users needs. Moreover, the systems were not designed to handle new requirements. The maintenance has to be an easy task due to the long lifetime experiment and professionals turnover. The Glance system, a generic mechanism for accessing any database, acts as an intermediate layer isolating the user from the particularities of each database. It retrieves, inserts and updates the database independently of its technology and modeling. Relying on Glance, a group of systems were built to support the ATLAS management and operation aspects: ATLAS Membership, ATLAS Appointments, ATLAS Speakers, ATLAS Analysis Follow-Up, ATLAS Conference Notes, ATLAS Thesis, ATLAS Traceability and DSS Alarms Viewer. This paper presents the overview of the Glance information framework and describes the privilege mechanism developed to grant different level of access for each member and system.
A chromosome-centric human proteome project (C-HPP) to characterize the sets of proteins encoded in chromosome 17.

PubMed

Liu, Suli; Im, Hogune; Bairoch, Amos; Cristofanilli, Massimo; Chen, Rui; Deutsch, Eric W; Dalton, Stephen; Fenyo, David; Fanayan, Susan; Gates, Chris; Gaudet, Pascale; Hincapie, Marina; Hanash, Samir; Kim, Hoguen; Jeong, Seul-Ki; Lundberg, Emma; Mias, George; Menon, Rajasree; Mu, Zhaomei; Nice, Edouard; Paik, Young-Ki; Uhlen, Mathias; Wells, Lance; Wu, Shiaw-Lin; Yan, Fangfei; Zhang, Fan; Zhang, Yue; Snyder, Michael; Omenn, Gilbert S; Beavis, Ronald C; Hancock, William S

2013-01-04

We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
Multiple brain atlas database and atlas-based neuroimaging system.

PubMed

Nowinski, W L; Fang, A; Nguyen, B T; Raphel, J K; Jagannathan, L; Raghavan, R; Bryan, R N; Miller, G A

1997-01-01

For the purpose of developing multiple, complementary, fully labeled electronic brain atlases and an atlas-based neuroimaging system for analysis, quantification, and real-time manipulation of cerebral structures in two and three dimensions, we have digitized, enhanced, segmented, and labeled the following print brain atlases: Co-Planar Stereotaxic Atlas of the Human Brain by Talairach and Tournoux, Atlas for Stereotaxy of the Human Brain by Schaltenbrand and Wahren, Referentially Oriented Cerebral MRI Anatomy by Talairach and Tournoux, and Atlas of the Cerebral Sulci by Ono, Kubik, and Abernathey. Three-dimensional extensions of these atlases have been developed as well. All two- and three-dimensional atlases are mutually preregistered and may be interactively registered with an actual patient's data. An atlas-based neuroimaging system has been developed that provides support for reformatting, registration, visualization, navigation, image processing, and quantification of clinical data. The anatomical index contains about 1,000 structures and over 400 sulcal patterns. Several new applications of the brain atlas database also have been developed, supported by various technologies such as virtual reality, the Internet, and electronic publishing. Fusion of information from multiple atlases assists the user in comprehensively understanding brain structures and identifying and quantifying anatomical regions in clinical data. The multiple brain atlas database and atlas-based neuroimaging system have substantial potential impact in stereotactic neurosurgery and radiotherapy by assisting in visualization and real-time manipulation in three dimensions of anatomical structures, in quantitative neuroradiology by allowing interactive analysis of clinical data, in three-dimensional neuroeducation, and in brain function studies.
Global GIS database; digital atlas of South Pacific

USGS Publications Warehouse

Hearn, P.P.; Hare, T.M.; Schruben, P.; Sherrill, D.; LaMar, C.; Tsushima, P.

2001-01-01

This CD-ROM contains a digital atlas of the countries of the South Pacific. This atlas is part of a global database compiled from USGS and other data sources at a nominal scale of 1:1 million and is intended to be used as a regional-scale reference and analytical tool by government officials, researchers, the private sector, and the general public. The atlas includes free GIS software or may be used with ESRI's ArcView software. Customized ArcView tools, specifically designed to make the atlas easier to use, are also included.
Global GIS database; digital atlas of Africa

USGS Publications Warehouse

Hearn, P.P.; Hare, T.M.; Schruben, P.; Sherrill, D.; LaMar, C.; Tsushima, P.

2001-01-01

This CD-ROM contains a digital atlas of the countries of Africa. This atlas is part of a global database compiled from USGS and other data sources at a nominal scale of 1:1 million and is intended to be used as a regional-scale reference and analytical tool by government officials, researchers, the private sector, and the general public. The atlas includes free GIS software or may be used with ESRI's ArcView software. Customized ArcView tools, specifically designed to make this atlas easier to use, are also included.
Global GIS database; digital atlas of South Asia

USGS Publications Warehouse

Hearn, P.P.; Hare, T.M.; Schruben, P.; Sherrill, D.; LaMar, C.; Tsushima, P.

2001-01-01

This CD-ROM contains a digital atlas of the countries of South Asia. This atlas is part of a global database compiled from USGS and other data sources at a nominal scale 1:1 million and is intended to be used as a regional-scale reference and analytical tool by government officials, researchers, the private sector, and the general public. The atlas includes free GIS software or may be used with ESRI's ArcView software. Customized ArcView tools, specifically designed to make the atlas easier to use, are also included.
Making proteomics data accessible and reusable: Current state of proteomics databases and repositories

PubMed Central

Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

2015-01-01

Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. PMID:25158685
Using PeptideAtlas, SRMAtlas and PASSEL – Comprehensive Resources for discovery and targeted proteomics

PubMed Central

Kusebauch, Ulrike; Deutsch, Eric W.; Campbell, David S.; Sun, Zhi; Farrah, Terry; Moritz, Robert L.

2014-01-01

PeptideAtlas, SRMAtlas and PASSEL are web-accessible resources to support discovery and targeted proteomics research. PeptideAtlas is a multi-species compendium of shotgun proteomic data provided by the scientific community, SRMAtlas is a resource of high-quality, complete proteome SRM assays generated in a consistent manner for the targeted identification and quantification of proteins, and PASSEL is a repository that compiles and represents selected reaction monitoring data, all in an easy to use interface. The databases are generated from native mass spectrometry data files that are analyzed in a standardized manner including statistical validation of the results. Each resource offers search functionalities and can be queried by user defined constraints; the query results are provided in tables or are graphically displayed. PeptideAtlas, SRMAtlas and PASSEL are publicly available freely via the website http://www.peptideatlas.org. In this protocol, we describe the use of these resources, we highlight how to submit, search, collate and download data. PMID:24939129

National Transportation Atlas Databases : 2014

DOT National Transportation Integrated Search

2014-01-01

The National Transportation Atlas Databases 2014 : (NTAD2014) is a set of nationwide geographic datasets of : transportation facilities, transportation networks, associated : infrastructure, and other political and administrative entities. : These da...
National Transportation Atlas Databases : 2015

DOT National Transportation Integrated Search

2015-01-01

The National Transportation Atlas Databases 2015 : (NTAD2015) is a set of nationwide geographic datasets of : transportation facilities, transportation networks, associated : infrastructure, and other political and administrative entities. : These da...
Advanced Technology Lifecycle Analysis System (ATLAS) Technology Tool Box (TTB)

NASA Technical Reports Server (NTRS)

Doyle, Monica; ONeil, Daniel A.; Christensen, Carissa B.

2005-01-01

The Advanced Technology Lifecycle Analysis System (ATLAS) is a decision support tool designed to aid program managers and strategic planners in determining how to invest technology research and development dollars. It is an Excel-based modeling package that allows a user to build complex space architectures and evaluate the impact of various technology choices. ATLAS contains system models, cost and operations models, a campaign timeline and a centralized technology database. Technology data for all system models is drawn from a common database, the ATLAS Technology Tool Box (TTB). The TTB provides a comprehensive, architecture-independent technology database that is keyed to current and future timeframes.
Histone Code Modulation by Oncogenic PWWP-Domain Protein in Breast Cancers

DTIC Science & Technology

2014-08-01

discs, the Drosophila melanogaster homo- logue of human retinoblastoma binding protein 2. Genetics 2000; 156: 645-663. [10] Zeng J, Ge Z, Wang L...in breast cancer patients (7-11). Earlier, we used genomic analysis of copy number and gene expression to perform a detailed analysis of the 8p11-12...from the 8p11-12 region (14). Very recently, we searched the Cancer Genome Atlas database that contains 744 breast invasive carcinomas. We found DNA or
National Transportation Atlas Databases : 2013

DOT National Transportation Integrated Search

2013-01-01

The National Transportation Atlas Databases 2013 (NTAD2013) is a set of nationwide geographic datasets of transportation facilities, transportation networks, associated infrastructure, and other political and administrative entities. These datasets i...
RiceAtlas, a spatial database of global rice calendars and production.

PubMed

Laborte, Alice G; Gutierrez, Mary Anne; Balanza, Jane Girly; Saito, Kazuki; Zwart, Sander J; Boschetti, Mirco; Murty, M V R; Villano, Lorena; Aunario, Jorrel Khalil; Reinke, Russell; Koo, Jawoo; Hijmans, Robert J; Nelson, Andrew

2017-05-30

Knowing where, when, and how much rice is planted and harvested is crucial information for understanding the effects of policy, trade, and global and technological change on food security. We developed RiceAtlas, a spatial database on the seasonal distribution of the world's rice production. It consists of data on rice planting and harvesting dates by growing season and estimates of monthly production for all rice-producing countries. Sources used for planting and harvesting dates include global and regional databases, national publications, online reports, and expert knowledge. Monthly production data were estimated based on annual or seasonal production statistics, and planting and harvesting dates. RiceAtlas has 2,725 spatial units. Compared with available global crop calendars, RiceAtlas is nearly ten times more spatially detailed and has nearly seven times more spatial units, with at least two seasons of calendar data, making RiceAtlas the most comprehensive and detailed spatial database on rice calendar and production.
Digital hand atlas and computer-aided bone age assessment via the Web

NASA Astrophysics Data System (ADS)

Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente

1999-07-01

A frequently used assessment method of bone age is atlas matching by a radiological examination of a hand image against a reference set of atlas patterns of normal standards. We are in a process of developing a digital hand atlas with a large standard set of normal hand and wrist images that reflect the skeletal maturity, race and sex difference, and current child development. The digital hand atlas will be used for a computer-aided bone age assessment via Web. We have designed and partially implemented a computer-aided diagnostic (CAD) system for Web-based bone age assessment. The system consists of a digital hand atlas, a relational image database and a Web-based user interface. The digital atlas is based on a large standard set of normal hand an wrist images with extracted bone objects and quantitative features. The image database uses a content- based indexing to organize the hand images and their attributes and present to users in a structured way. The Web-based user interface allows users to interact with the hand image database from browsers. Users can use a Web browser to push a clinical hand image to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, will be extracted and compared with patterns from the atlas database to assess the bone age. The relevant reference imags and the final assessment report will be sent back to the user's browser via Web. The digital atlas will remove the disadvantages of the currently out-of-date one and allow the bone age assessment to be computerized and done conveniently via Web. In this paper, we present the system design and Web-based client-server model for computer-assisted bone age assessment and our initial implementation of the digital atlas database.
Screening of missing proteins in the human liver proteome by improved MRM-approach-based targeted proteomics.

PubMed

Chen, Chen; Liu, Xiaohui; Zheng, Weimin; Zhang, Lei; Yao, Jun; Yang, Pengyuan

2014-04-04

To completely annotate the human genome, the task of identifying and characterizing proteins that currently lack mass spectrometry (MS) evidence is inevitable and urgent. In this study, as the first effort to screen missing proteins in large scale, we developed an approach based on SDS-PAGE followed by liquid chromatography-multiple reaction monitoring (LC-MRM), for screening of those missing proteins with only a single peptide hit in the previous liver proteome data set. Proteins extracted from normal human liver were separated in SDS-PAGE and digested in split gel slice, and the resulting digests were then subjected to LC-schedule MRM analysis. The MRM assays were developed through synthesized crude peptides for target peptides. In total, the expressions of 57 target proteins were confirmed from 185 MRM assays in normal human liver tissues. Among the proved 57 one-hit wonders, 50 proteins are of the minimally redundant set in the PeptideAtlas database, 7 proteins even have none MS-based information previously in various biological processes. We conclude that our SDS-PAGE-MRM workflow can be a powerful approach to screen missing or poorly characterized proteins in different samples and to provide their quantity if detected. The MRM raw data have been uploaded to ISB/SRM Atlas/PASSEL (PXD000648).
Expression of the Long Intergenic Non-Protein Coding RNA 665 (LINC00665) Gene and the Cell Cycle in Hepatocellular Carcinoma Using The Cancer Genome Atlas, the Gene Expression Omnibus, and Quantitative Real-Time Polymerase Chain Reaction.

PubMed

Wen, Dong-Yue; Lin, Peng; Pang, Yu-Yan; Chen, Gang; He, Yun; Dang, Yi-Wu; Yang, Hong

2018-05-05

BACKGROUND Long non-coding RNAs (lncRNAs) have a role in physiological and pathological processes, including cancer. The aim of this study was to investigate the expression of the long intergenic non-protein coding RNA 665 (LINC00665) gene and the cell cycle in hepatocellular carcinoma (HCC) using database analysis including The Cancer Genome Atlas (TCGA), the Gene Expression Omnibus (GEO), and quantitative real-time polymerase chain reaction (qPCR). MATERIAL AND METHODS Expression levels of LINC00665 were compared between human tissue samples of HCC and adjacent normal liver, clinicopathological correlations were made using TCGA and the GEO, and qPCR was performed to validate the findings. Other public databases were searched for other genes associated with LINC00665 expression, including The Atlas of Noncoding RNAs in Cancer (TANRIC), the Multi Experiment Matrix (MEM), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interaction (PPI) networks. RESULTS Overexpression of LINC00665 in patients with HCC was significantly associated with gender, tumor grade, stage, and tumor cell type. Overexpression of LINC00665 in patients with HCC was significantly associated with overall survival (OS) (HR=1.47795%; CI: 1.046-2.086). Bioinformatics analysis identified 469 related genes and further analysis supported a hypothesis that LINC00665 regulates pathways in the cell cycle to facilitate the development and progression of HCC through ten identified core genes: CDK1, BUB1B, BUB1, PLK1, CCNB2, CCNB1, CDC20, ESPL1, MAD2L1, and CCNA2. CONCLUSIONS Overexpression of the lncRNA, LINC00665 may be involved in the regulation of cell cycle pathways in HCC through ten identified hub genes.
Digital hand atlas for web-based bone age assessment: system design and implementation

NASA Astrophysics Data System (ADS)

Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente

2000-04-01

A frequently used assessment method of skeletal age is atlas matching by a radiological examination of a hand image against a small set of Greulich-Pyle patterns of normal standards. The method however can lead to significant deviation in age assessment, due to a variety of observers with different levels of training. The Greulich-Pyle atlas based on middle upper class white populations in the 1950s, is also not fully applicable for children of today, especially regarding the standard development in other racial groups. In this paper, we present our system design and initial implementation of a digital hand atlas and computer-aided diagnostic (CAD) system for Web-based bone age assessment. The digital atlas will remove the disadvantages of the currently out-of-date one and allow the bone age assessment to be computerized and done conveniently via Web. The system consists of a hand atlas database, a CAD module and a Java-based Web user interface. The atlas database is based on a large set of clinically normal hand images of diverse ethnic groups. The Java-based Web user interface allows users to interact with the hand image database form browsers. Users can use a Web browser to push a clinical hand image to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, is then extracted and compared with patterns from the atlas database to assess the bone age.
Multiatlas whole heart segmentation of CT data using conditional entropy for atlas ranking and selection.

PubMed

Zhuang, Xiahai; Bai, Wenjia; Song, Jingjing; Zhan, Songhua; Qian, Xiaohua; Shi, Wenzhe; Lian, Yanyun; Rueckert, Daniel

2015-07-01

Cardiac computed tomography (CT) is widely used in clinical diagnosis of cardiovascular diseases. Whole heart segmentation (WHS) plays a vital role in developing new clinical applications of cardiac CT. However, the shape and appearance of the heart can vary greatly across different scans, making the automatic segmentation particularly challenging. The objective of this work is to develop and evaluate a multiatlas segmentation (MAS) scheme using a new atlas ranking and selection algorithm for automatic WHS of CT data. Research on different MAS strategies and their influence on WHS performance are limited. This work provides a detailed comparison study evaluating the impacts of label fusion, atlas ranking, and sizes of the atlas database on the segmentation performance. Atlases in a database were registered to the target image using a hierarchical registration scheme specifically designed for cardiac images. A subset of the atlases were selected for label fusion, according to the authors' proposed atlas ranking criterion which evaluated the performance of each atlas by computing the conditional entropy of the target image given the propagated atlas labeling. Joint label fusion was used to combine multiple label estimates to obtain the final segmentation. The authors used 30 clinical cardiac CT angiography (CTA) images to evaluate the proposed MAS scheme and to investigate different segmentation strategies. The mean WHS Dice score of the proposed MAS method was 0.918 ± 0.021, and the mean runtime for one case was 13.2 min on a workstation. This MAS scheme using joint label fusion generated significantly better Dice scores than the other label fusion strategies, including majority voting (0.901 ± 0.276, p < 0.01), locally weighted voting (0.905 ± 0.0247, p < 0.01), and probabilistic patch-based fusion (0.909 ± 0.0249, p < 0.01). In the atlas ranking study, the proposed criterion based on conditional entropy yielded a performance curve with higher WHS Dice scores compared to the conventional schemes (p < 0.03). In the atlas database study, the authors showed that the MAS using larger atlas databases generated better performance curves than the MAS using smaller ones, indicating larger atlas databases could produce more accurate segmentation. The authors have developed a new MAS framework for automatic WHS of CTA and investigated alternative implementations of MAS. With the proposed atlas ranking algorithm and joint label fusion, the MAS scheme is able to generate accurate segmentation within practically acceptable computation time. This method can be useful for the development of new clinical applications of cardiac CT.
Multi atlas based segmentation: Should we prefer the best atlas group over the group of best atlases?

PubMed

Zaffino, Paolo; Ciardo, Delia; Raudaschl, Patrik; Fritscher, Karl; Ricotti, Rosalinda; Alterio, Daniela; Marvaso, Giulia; Fodor, Cristiana; Baroni, Guido; Amato, Francesco; Orecchia, Roberto; Jereczek-Fossa, Barbara Alicja; Sharp, Gregory C; Spadea, Maria Francesca

2018-05-22

Multi Atlas Based Segmentation (MABS) uses a database of atlas images, and an atlas selection process is used to choose an atlas subset for registration and voting. In the current state of the art, atlases are chosen according to a similarity criterion between the target subject and each atlas in the database. In this paper, we propose a new concept for atlas selection that relies on selecting the best performing group of atlases rather than the group of highest scoring individual atlases. Experiments were performed using CT images of 50 patients, with contours of brainstem and parotid glands. The dataset was randomly split in 2 groups: 20 volumes were used as an atlas database and 30 served as target subjects for testing. Classic oracle group selection, where atlases are chosen by the highest Dice Similarity Coefficient (DSC) with the target, was performed. This was compared to oracle Group selection, where all the combinations of atlas subgroups were considered and scored by computing DSC with the target subject. Subsequently, Convolutional Neural Networks (CNNs) were designed to predict the best group of atlases. The results were compared also with the selection strategy based on Normalized Mutual Information (NMI). Oracle group was proved to be significantly better that classic oracle selection (p<10-5). Atlas group selection led to a median±interquartile DSC of 0.740±0.084, 0.718±0.086 and 0.670±0.097 for brainstem and left/right parotid glands respectively, outperforming NMI selection 0.676±0.113, 0.632±0.104 and 0.606±0.118 (p<0.001) as well as classic oracle selection. The implemented methodology is a proof of principle that selecting the atlases by considering the performance of the entire group of atlases instead of each single atlas leads to higher segmentation accuracy, being even better then current oracle strategy. This finding opens a new discussion about the most appropriate atlas selection criterion for MABS. © 2018 Institute of Physics and Engineering in Medicine.
Making proteomics data accessible and reusable: current state of proteomics databases and repositories.

PubMed

Perez-Riverol, Yasset; Alpi, Emanuele; Wang, Rui; Hermjakob, Henning; Vizcaíno, Juan Antonio

2015-03-01

Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data. © 2014 The Authors. PROTEOMICS published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Experience with ATLAS MySQL PanDA database service

NASA Astrophysics Data System (ADS)

Smirnov, Y.; Wlodek, T.; De, K.; Hover, J.; Ozturk, N.; Smith, J.; Wenaus, T.; Yu, D.

2010-04-01

The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all PanDA information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis.
Multiatlas whole heart segmentation of CT data using conditional entropy for atlas ranking and selection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhuang, Xiahai, E-mail: zhuangxiahai@sjtu.edu.cn; Qian, Xiaohua; Bai, Wenjia

Purpose: Cardiac computed tomography (CT) is widely used in clinical diagnosis of cardiovascular diseases. Whole heart segmentation (WHS) plays a vital role in developing new clinical applications of cardiac CT. However, the shape and appearance of the heart can vary greatly across different scans, making the automatic segmentation particularly challenging. The objective of this work is to develop and evaluate a multiatlas segmentation (MAS) scheme using a new atlas ranking and selection algorithm for automatic WHS of CT data. Research on different MAS strategies and their influence on WHS performance are limited. This work provides a detailed comparison study evaluatingmore » the impacts of label fusion, atlas ranking, and sizes of the atlas database on the segmentation performance. Methods: Atlases in a database were registered to the target image using a hierarchical registration scheme specifically designed for cardiac images. A subset of the atlases were selected for label fusion, according to the authors’ proposed atlas ranking criterion which evaluated the performance of each atlas by computing the conditional entropy of the target image given the propagated atlas labeling. Joint label fusion was used to combine multiple label estimates to obtain the final segmentation. The authors used 30 clinical cardiac CT angiography (CTA) images to evaluate the proposed MAS scheme and to investigate different segmentation strategies. Results: The mean WHS Dice score of the proposed MAS method was 0.918 ± 0.021, and the mean runtime for one case was 13.2 min on a workstation. This MAS scheme using joint label fusion generated significantly better Dice scores than the other label fusion strategies, including majority voting (0.901 ± 0.276, p < 0.01), locally weighted voting (0.905 ± 0.0247, p < 0.01), and probabilistic patch-based fusion (0.909 ± 0.0249, p < 0.01). In the atlas ranking study, the proposed criterion based on conditional entropy yielded a performance curve with higher WHS Dice scores compared to the conventional schemes (p < 0.03). In the atlas database study, the authors showed that the MAS using larger atlas databases generated better performance curves than the MAS using smaller ones, indicating larger atlas databases could produce more accurate segmentation. Conclusions: The authors have developed a new MAS framework for automatic WHS of CTA and investigated alternative implementations of MAS. With the proposed atlas ranking algorithm and joint label fusion, the MAS scheme is able to generate accurate segmentation within practically acceptable computation time. This method can be useful for the development of new clinical applications of cardiac CT.« less
Conversion of environmental data to a digital-spatial database, Puget Sound area, Washington

USGS Publications Warehouse

Uhrich, M.A.; McGrath, T.S.

1997-01-01

Data and maps from the Puget Sound Environmental Atlas, compiled for the U.S. Environmental Protection Agency, the Puget Sound Water Quality Authority, and the U.S. Army Corps of Engineers, have been converted into a digital-spatial database using a geographic information system. Environmental data for the Puget Sound area,collected from sources other than the Puget SoundEnvironmental Atlas by different Federal, State, andlocal agencies, also have been converted into thisdigital-spatial database. Background on the geographic-information-system planning process, the design and implementation of the geographic information-system database, and the reasons for conversion to this digital-spatial database are included in this report. The Puget Sound Environmental Atlas data layers include information about seabird nesting areas, eelgrass and kelp habitat, marine mammal and fish areas, and shellfish resources and bed certification. Data layers, from sources other than the Puget Sound Environmental Atlas, include the Puget Sound shoreline, the water-body system, shellfish growing areas, recreational shellfish beaches, sewage-treatment outfalls, upland hydrography,watershed and political boundaries, and geographicnames. The sources of data, descriptions of the datalayers, and the steps and errors of processing associated with conversion to a digital-spatial database used in development of the Puget Sound Geographic Information System also are included in this report. The appendixes contain data dictionaries for each of the resource layers and error values for the conversion of Puget SoundEnvironmental Atlas data.
RICD: a rice indica cDNA database resource for rice functional genomics.

PubMed

Lu, Tingting; Huang, Xuehui; Zhu, Chuanrang; Huang, Tao; Zhao, Qiang; Xie, Kabing; Xiong, Lizhong; Zhang, Qifa; Han, Bin

2008-11-26

The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Rice Indica cDNA Database (RICD) is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB) and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.
Global GIS database; digital atlas of Central and South America

USGS Publications Warehouse

Hearn,, Paul P.; Hare, T.; Schruben, P.; Sherrill, D.; LaMar, C.; Tsushima, P.

2000-01-01

This CD-ROM contains a digital atlas of the countries of Central and South America. This atlas is part of a global database compiled from USGS and other data sources at the nominal scale of 1:1 million and is intended to be used as a regional-scale reference and analytical tool by government officials, researchers, the private sector, and the general public. The atlas includes free GIS software or may also be used with ESRI's ArcView software. Customized ArcView tools, specifically designed to make the atlas easier to use, are also included. The atlas contains the following datasets: country political boundaries, digital shaded relief map, elevation, slope, hydrology, locations of cities and towns, airfields, roads, railroads, utility lines, population density, geology, ecological regions, historical seismicity, volcanoes, ore deposits, oil and gas fields, climate data, landcover, vegetation index, and lights at night.
Automating the Generation of the Cassini Tour Atlas Database

NASA Technical Reports Server (NTRS)

Grazier, Kevin R.; Roumeliotis, Chris; Lange, Robert D.

2010-01-01

The Tour Atlas is a large database of geometrical tables, plots, and graphics used by Cassini science planning engineers and scientists primarily for science observation planning. Over time, as the contents of the Tour Atlas grew, the amount of time it took to recreate the Tour Atlas similarly grew--to the point that it took one person a week of effort. When Cassini tour designers estimated that they were going to create approximately 30 candidate Extended Mission trajectories--which needed to be analyzed for science return in a short amount of time--it became a necessity to automate. We report on the automation methodology that reduced the amount of time it took one person to (re)generate a Tour Atlas from a week to, literally, one UNIX command.
Sleep atlas and multimedia database.

PubMed

Penzel, T; Kesper, K; Mayer, G; Zulley, J; Peter, J H

2000-01-01

The ENN sleep atlas and database was set up on a dedicated server connected to the internet thus providing all services such as WWW, ftp and telnet access. The database serves as a platform to promote the goals of the European Neurological Network, to exchange patient cases for second opinion between experts and to create a case-oriented multimedia sleep atlas with descriptive text, images and video-clips of all known sleep disorders. The sleep atlas consists of a small public and a large private part for members of the consortium. 20 patient cases were collected and presented with educational information similar to published case reports. Case reports are complemented with images, video-clips and biosignal recordings. A Java based viewer for biosignals provided in EDF format was installed in order to move free within the sleep recordings without the need to download the full recording on the client.

Technical and Organizational Considerations for the Long-Term Maintenance and Development of Digital Brain Atlases and Web-Based Databases

PubMed Central

Ito, Kei

2010-01-01

Digital brain atlas is a kind of image database that specifically provide information about neurons and glial cells in the brain. It has various advantages that are unmatched by conventional paper-based atlases. Such advantages, however, may become disadvantages if appropriate cares are not taken. Because digital atlases can provide unlimited amount of data, they should be designed to minimize redundancy and keep consistency of the records that may be added incrementally by different staffs. The fact that digital atlases can easily be revised necessitates a system to assure that users can access previous versions that might have been cited in papers at a particular period. To inherit our knowledge to our descendants, such databases should be maintained for a very long period, well over 100 years, like printed books and papers. Technical and organizational measures to enable long-term archive should be considered seriously. Compared to the initial development of the database, subsequent efforts to increase the quality and quantity of its contents are not regarded highly, because such tasks do not materialize in the form of publications. This fact strongly discourages continuous expansion of, and external contributions to, the digital atlases after its initial launch. To solve these problems, the role of the biocurators is vital. Appreciation of the scientific achievements of the people who do not write papers, and establishment of the secure academic career path for them, are indispensable for recruiting talents for this very important job. PMID:20661458
ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins

PubMed Central

Krassowski, Michal; Paczkowska, Marta; Cullion, Kim; Huang, Tina; Dzneladze, Irakli; Ouellette, B F Francis; Yamada, Joseph T; Fradet-Turcotte, Amelie

2018-01-01

Abstract Interpretation of genetic variation is needed for deciphering genotype-phenotype associations, mechanisms of inherited disease, and cancer driver mutations. Millions of single nucleotide variants (SNVs) in human genomes are known and thousands are associated with disease. An estimated 21% of disease-associated amino acid substitutions corresponding to missense SNVs are located in protein sites of post-translational modifications (PTMs), chemical modifications of amino acids that extend protein function. ActiveDriverDB is a comprehensive human proteo-genomics database that annotates disease mutations and population variants through the lens of PTMs. We integrated >385,000 published PTM sites with ∼3.6 million substitutions from The Cancer Genome Atlas (TCGA), the ClinVar database of disease genes, and human genome sequencing projects. The database includes site-specific interaction networks of proteins, upstream enzymes such as kinases, and drugs targeting these enzymes. We also predicted network-rewiring impact of mutations by analyzing gains and losses of kinase-bound sequence motifs. ActiveDriverDB provides detailed visualization, filtering, browsing and searching options for studying PTM-associated mutations. Users can upload mutation datasets interactively and use our application programming interface in pipelines. Integrative analysis of mutations and PTMs may help decipher molecular mechanisms of phenotypes and disease, as exemplified by case studies of TP53, BRCA2 and VHL. The open-source database is available at https://www.ActiveDriverDB.org. PMID:29126202
First use of LHC Run 3 Conditions Database infrastructure for auxiliary data files in ATLAS

NASA Astrophysics Data System (ADS)

Aperio Bella, L.; Barberis, D.; Buttinger, W.; Formica, A.; Gallas, E. J.; Rinaldi, L.; Rybkin, G.; ATLAS Collaboration

2017-10-01

Processing of the large amount of data produced by the ATLAS experiment requires fast and reliable access to what we call Auxiliary Data Files (ADF). These files, produced by Combined Performance, Trigger and Physics groups, contain conditions, calibrations, and other derived data used by the ATLAS software. In ATLAS this data has, thus far for historical reasons, been collected and accessed outside the ATLAS Conditions Database infrastructure and related software. For this reason, along with the fact that ADF are effectively read by the software as binary objects, this class of data appears ideal for testing the proposed Run 3 conditions data infrastructure now in development. This paper describes this implementation as well as the lessons learned in exploring and refining the new infrastructure with the potential for deployment during Run 2.
Heart research advances using database search engines, Human Protein Atlas and the Sydney Heart Bank.

PubMed

Li, Amy; Estigoy, Colleen; Raftery, Mark; Cameron, Darryl; Odeberg, Jacob; Pontén, Fredrik; Lal, Sean; Dos Remedios, Cristobal G

2013-10-01

This Methodological Review is intended as a guide for research students who may have just discovered a human "novel" cardiac protein, but it may also help hard-pressed reviewers of journal submissions on a "novel" protein reported in an animal model of human heart failure. Whether you are an expert or not, you may know little or nothing about this particular protein of interest. In this review we provide a strategic guide on how to proceed. We ask: How do you discover what has been published (even in an abstract or research report) about this protein? Everyone knows how to undertake literature searches using PubMed and Medline but these are usually encyclopaedic, often producing long lists of papers, most of which are either irrelevant or only vaguely relevant to your query. Relatively few will be aware of more advanced search engines such as Google Scholar and even fewer will know about Quertle. Next, we provide a strategy for discovering if your "novel" protein is expressed in the normal, healthy human heart, and if it is, we show you how to investigate its subcellular location. This can usually be achieved by visiting the website "Human Protein Atlas" without doing a single experiment. Finally, we provide a pathway to discovering if your protein of interest changes its expression level with heart failure/disease or with ageing. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.
Ukrainian Database and Atlas of Light Curves of Artificial Space Objects

NASA Astrophysics Data System (ADS)

Koshkin, N.; Savanevich, V.; Pohorelov, A.; Shakun, L.; Zhukov, V.; Korobeynikova, E.; Strakhova, S.; Moskalenko, S.; Kashuba, V.; Krasnoshchokov, A.

This paper describes the Ukrainian database of long-term photometric observations of resident space objects (RSO). For the purpose of using this database for the outer space monitoring and space situational awareness (SSA) the open internet resource has been developed. The paper shows examples of using the Atlas of light curves of RSO's for analyzing the state of rotation around the center of mass of several active and non-functioning satellites in orbit.
Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy.

PubMed

Elguoshy, Amr; Hirao, Yoshitoshi; Xu, Bo; Saito, Suguru; Quadery, Ali F; Yamamoto, Keiko; Mitsui, Toshiaki; Yamamoto, Tadashi

2017-12-01

In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10 -4 )% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.
Clinical value of miR-452-5p expression in lung adenocarcinoma: A retrospective quantitative real-time polymerase chain reaction study and verification based on The Cancer Genome Atlas and Gene Expression Omnibus databases.

PubMed

Gan, Xiao-Ning; Luo, Jie; Tang, Rui-Xue; Wang, Han-Lin; Zhou, Hong; Qin, Hui; Gan, Ting-Qing; Chen, Gang

2017-05-01

The role and mechanism of miR-452-5p in lung adenocarcinoma remain unclear. In this study, we performed a systematic study to investigate the clinical value of miR-452-5p expression in lung adenocarcinoma. The expression of miR-452-5p in 101 lung adenocarcinoma patients was detected by quantitative real-time polymerase chain reaction. The Cancer Genome Atlas and Gene Expression Omnibus databases were joined to verify the expression level of miR-452-5p in lung adenocarcinoma. Via several online prediction databases and bioinformatics software, pathway and network analyses of miR-452-5p target genes were performed to explore its prospective molecular mechanism. The expression of miR-452-5p in lung adenocarcinoma in house was significantly lower than that in adjacent tissues (p < 0.001). Additionally, the expression level of miR-452-5p was negatively correlated with several clinicopathological parameters including the tumor size (p = 0.014), lymph node metastasis (p = 0.032), and tumor-node-metastasis stage (p = 0.036). Data from The Cancer Genome Atlas also confirmed the low expression of miR-452 in lung adenocarcinoma (p < 0.001). Furthermore, reduced expression of miR-452-5p in lung adenocarcinoma (standard mean deviations = -0.393, 95% confidence interval: -0.774 to -0.011, p = 0.044) was validated by a meta-analysis. Five hub genes targeted by miR-452-5p, including SMAD family member 4, SMAD family member 2, cyclin-dependent kinase inhibitor 1B, tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein epsilon, and tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein beta, were significantly enriched in the cell-cycle pathway. In conclusion, low expression of miR-452-5p tends to play an essential role in lung adenocarcinoma. Bioinformatics analysis might be beneficial to reveal the potential mechanism of miR-452-5p in lung adenocarcinoma.
TransAtlasDB: an integrated database connecting expression data, metadata and variants

PubMed Central

Adetunji, Modupeore O; Lamont, Susan J; Schmidt, Carl J

2018-01-01

Abstract High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis. Database URL: https://modupeore.github.io/TransAtlasDB/ PMID:29688361
Applying graph theory to protein structures: an atlas of coiled coils.

PubMed

Heal, Jack W; Bartlett, Gail J; Wood, Christopher W; Thomson, Andrew R; Woolfson, Derek N

2018-05-02

To understand protein structure, folding and function fully and to design proteins de novo reliably, we must learn from natural protein structures that have been characterised experimentally. The number of protein structures available is large and growing exponentially, which makes this task challenging. Indeed, computational resources are becoming increasingly important for classifying and analysing this resource. Here, we use tools from graph theory to define an atlas classification scheme for automatically categorising certain protein substructures. Focusing on the α-helical coiled coils, which are ubiquitous protein-structure and protein-protein interaction motifs, we present a suite of computational resources designed for analysing these assemblies. iSOCKET enables interactive analysis of side-chain packing within proteins to identify coiled coils automatically and with considerable user control. Applying a graph theory-based atlas classification scheme to structures identified by iSOCKET gives the Atlas of Coiled Coils, a fully automated, updated overview of extant coiled coils. The utility of this approach is illustrated with the first formal classification of an emerging subclass of coiled coils called α-helical barrels. Furthermore, in the Atlas, the known coiled-coil universe is presented alongside a partial enumeration of the 'dark matter' of coiled-coil structures; i.e., those coiled-coil architectures that are theoretically possible but have not been observed to date, and thus present defined targets for protein design. iSOCKET is available as part of the open-source GitHub repository associated with this work (https://github.com/woolfson-group/isocket). This repository also contains all the data generated when classifying the protein graphs. The Atlas of Coiled Coils is available at: http://coiledcoils.chm.bris.ac.uk/atlas/app.
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.

PubMed

Hallin, Peter F; Ussery, David W

2004-12-12

Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
Pivotal role of the muscle-contraction pathway in cryptorchidism and evidence for genomic connections with cardiomyopathy pathways in RASopathies.

PubMed

Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja

2013-02-14

Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Evolution of the use of relational and NoSQL databases in the ATLAS experiment

NASA Astrophysics Data System (ADS)

Barberis, D.

2016-09-01

The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
Second NASA Technical Interchange Meeting (TIM): Advanced Technology Lifecycle Analysis System (ATLAS) Technology Tool Box (TTB)

NASA Technical Reports Server (NTRS)

ONeil, D. A.; Mankins, J. C.; Christensen, C. B.; Gresham, E. C.

2005-01-01

The Advanced Technology Lifecycle Analysis System (ATLAS), a spreadsheet analysis tool suite, applies parametric equations for sizing and lifecycle cost estimation. Performance, operation, and programmatic data used by the equations come from a Technology Tool Box (TTB) database. In this second TTB Technical Interchange Meeting (TIM), technologists, system model developers, and architecture analysts discussed methods for modeling technology decisions in spreadsheet models, identified specific technology parameters, and defined detailed development requirements. This Conference Publication captures the consensus of the discussions and provides narrative explanations of the tool suite, the database, and applications of ATLAS within NASA s changing environment.
Quantitative Proteomics Identifies Activation of Hallmark Pathways of Cancer in Patient Melanoma.

PubMed

Byrum, Stephanie D; Larson, Signe K; Avaritt, Nathan L; Moreland, Linley E; Mackintosh, Samuel G; Cheung, Wang L; Tackett, Alan J

2013-03-01

Molecular pathways regulating melanoma initiation and progression are potential targets of therapeutic development for this aggressive cancer. Identification and molecular analysis of these pathways in patients has been primarily restricted to targeted studies on individual proteins. Here, we report the most comprehensive analysis of formalin-fixed paraffin-embedded human melanoma tissues using quantitative proteomics. From 61 patient samples, we identified 171 proteins varying in abundance among benign nevi, primary melanoma, and metastatic melanoma. Seventy-three percent of these proteins were validated by immunohistochemistry staining of malignant melanoma tissues from the Human Protein Atlas database. Our results reveal that molecular pathways involved with tumor cell proliferation, motility, and apoptosis are mis-regulated in melanoma. These data provide the most comprehensive proteome resource on patient melanoma and reveal insight into the molecular mechanisms driving melanoma progression.
EMAP and EMAGE: a framework for understanding spatially organized data.

PubMed

Baldock, Richard A; Bard, Jonathan B L; Burger, Albert; Burton, Nicolas; Christiansen, Jeff; Feng, Guanjie; Hill, Bill; Houghton, Derek; Kaufman, Matthew; Rao, Jianguo; Sharpe, James; Ross, Allyson; Stevenson, Peter; Venkataraman, Shanmugasundaram; Waterhouse, Andrew; Yang, Yiya; Davidson, Duncan R

2003-01-01

The Edinburgh MouseAtlas Project (EMAP) is a time-series of mouse-embryo volumetric models. The models provide a context-free spatial framework onto which structural interpretations and experimental data can be mapped. This enables collation, comparison, and query of complex spatial patterns with respect to each other and with respect to known or hypothesized structure. The atlas also includes a time-dependent anatomical ontology and mapping between the ontology and the spatial models in the form of delineated anatomical regions or tissues. The models provide a natural, graphical context for browsing and visualizing complex data. The Edinburgh Mouse Atlas Gene-Expression Database (EMAGE) is one of the first applications of the EMAP framework and provides a spatially mapped gene-expression database with associated tools for data mapping, submission, and query. In this article, we describe the underlying principles of the Atlas and the gene-expression database, and provide a practical introduction to the use of the EMAP and EMAGE tools, including use of new techniques for whole body gene-expression data capture and mapping.
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

NASA Astrophysics Data System (ADS)

Dykstra, Dave

2012-12-01

One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dykstra, Dave

One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less
The European Bioinformatics Institute's data resources 2014.

PubMed

Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

2014-01-01

Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.
National Transportation Atlas Databases : 1998

DOT National Transportation Integrated Search

1998-01-01

The North American Transportation Atlas Data - 1998 (NORTAD) is a set of : geographic data sets for transportation facilities in Canada, Mexico, and the United : States. These data sets include geospatial information for transportation modal networks...
Genomic atlas of the human plasma proteome.

PubMed

Sun, Benjamin B; Maranville, Joseph C; Peters, James E; Stacey, David; Staley, James R; Blackshaw, James; Burgess, Stephen; Jiang, Tao; Paige, Ellie; Surendran, Praveen; Oliver-Williams, Clare; Kamat, Mihir A; Prins, Bram P; Wilcox, Sheri K; Zimmerman, Erik S; Chi, An; Bansal, Narinder; Spain, Sarah L; Wood, Angela M; Morrell, Nicholas W; Bradley, John R; Janjic, Nebojsa; Roberts, David J; Ouwehand, Willem H; Todd, John A; Soranzo, Nicole; Suhre, Karsten; Paul, Dirk S; Fox, Caroline S; Plenge, Robert M; Danesh, John; Runz, Heiko; Butterworth, Adam S

2018-06-01

Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.

Evaluation of the Urine Protein/Creatinine Ratio Measured with the Dipsticks Clinitek Atlas PRO 12.

PubMed

Hermida, Fernando J; Soto, Sonia; Benitez, Alfonso J

2016-01-01

Screening for urine proteins is recommended for the detection of albuminuria in high risk groups. The aim of this study was to compare the Clinitek Atlas PRO12 reagent urine strip with quantitative methods for the determination of protein/creatinine ratio and to evaluate the usefulness of the semi-quantitative Clinitek Atlas PRO12 reagent urine strip as a tool in the early detection of albuminuria among the general population. Six hundred first morning urine specimens were collected from outpatients with various clinical conditions. The results showed that the test data for the urine dipstick Clinitek Atlas PRO12 show good agreement with the quantitative measurement of protein, creatinine and protein/creatinine ratio. In addition, this study shows that 97.2% of the samples which gave "normal" protein/creatinine ratios by the semi-quantitative method, showed albumin/creatinine ratio < 30 mg/g by the quantitative methods. Our results show that Clinitek Atlas PRO12 reagent strips can be used for the purposes of albuminuria screening in the general population.
WatAA: Atlas of Protein Hydration. Exploring synergies between data mining and ab initio calculations.

PubMed

Černý, Jiří; Schneider, Bohdan; Biedermannová, Lada

2017-07-14

Water molecules represent an integral part of proteins and a key determinant of protein structure, dynamics and function. WatAA is a newly developed, web-based atlas of amino-acid hydration in proteins. The atlas provides information about the ordered first hydration shell of the most populated amino-acid conformers in proteins. The data presented in the atlas are drawn from two sources: experimental data and ab initio quantum-mechanics calculations. The experimental part is based on a data-mining study of a large set of high-resolution protein crystal structures. The crystal-derived data include 3D maps of water distribution around amino-acids and probability of occurrence of each of the identified hydration sites. The quantum mechanics calculations validate and extend this primary description by optimizing the water position for each hydration site, by providing hydrogen atom positions and by quantifying the interaction energy that stabilizes the water molecule at the particular hydration site position. The calculations show that the majority of experimentally derived hydration sites are positioned near local energy minima for water, and the calculated interaction energies help to assess the preference of water for the individual hydration sites. We propose that the atlas can be used to validate water placement in electron density maps in crystallographic refinement, to locate water molecules mediating protein-ligand interactions in drug design, and to prepare and evaluate molecular dynamics simulations. WatAA: Atlas of Protein Hydration is freely available without login at .
The ATLAS TAGS database distribution and management - Operational challenges of a multi-terabyte distributed database

NASA Astrophysics Data System (ADS)

Viegas, F.; Malon, D.; Cranshaw, J.; Dimitrov, G.; Nowak, M.; Nairz, A.; Goossens, L.; Gallas, E.; Gamboa, C.; Wong, A.; Vinek, E.

2010-04-01

The TAG files store summary event quantities that allow a quick selection of interesting events. This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools. The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites. The sheer volume and high rate of production makes this application a challenge to data and resource management, in many aspects. This paper will focus on the operational challenges of this system. These include: uploading the data from files to the CERN's and remote sites' databases; distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.
Cassini Tour Atlas Automated Generation

NASA Technical Reports Server (NTRS)

Grazier, Kevin R.; Roumeliotis, Chris; Lange, Robert D.

2011-01-01

During the Cassini spacecraft s cruise phase and nominal mission, the Cassini Science Planning Team developed and maintained an online database of geometric and timing information called the Cassini Tour Atlas. The Tour Atlas consisted of several hundreds of megabytes of EVENTS mission planning software outputs, tables, plots, and images used by mission scientists for observation planning. Each time the nominal mission trajectory was altered or tweaked, a new Tour Atlas had to be regenerated manually. In the early phases of Cassini s Equinox Mission planning, an a priori estimate suggested that mission tour designers would develop approximately 30 candidate tours within a short period of time. So that Cassini scientists could properly analyze the science opportunities in each candidate tour quickly and thoroughly so that the optimal series of orbits for science return could be selected, a separate Tour Atlas was required for each trajectory. The task of manually generating the number of trajectory analyses in the allotted time would have been impossible, so the entire task was automated using code written in five different programming languages. This software automates the generation of the Cassini Tour Atlas database. It performs with one UNIX command what previously took a day or two of human labor.
On-the-fly selection of cell-specific enhancers, genes, miRNAs and proteins across the human body using SlideBase

PubMed Central

Ienasescu, Hans; Li, Kang; Andersson, Robin; Vitezic, Morana; Rennie, Sarah; Chen, Yun; Vitting-Seerup, Kristoffer; Lagoni, Emil; Boyd, Mette; Bornholdt, Jette; de Hoon, Michiel J. L.; Kawaji, Hideya; Lassmann, Timo; Hayashizaki, Yoshihide; Forrest, Alistair R. R.; Carninci, Piero; Sandelin, Albin

2016-01-01

Genomics consortia have produced large datasets profiling the expression of genes, micro-RNAs, enhancers and more across human tissues or cells. There is a need for intuitive tools to select subsets of such data that is the most relevant for specific studies. To this end, we present SlideBase, a web tool which offers a new way of selecting genes, promoters, enhancers and microRNAs that are preferentially expressed/used in a specified set of cells/tissues, based on the use of interactive sliders. With the help of sliders, SlideBase enables users to define custom expression thresholds for individual cell types/tissues, producing sets of genes, enhancers etc. which satisfy these constraints. Changes in slider settings result in simultaneous changes in the selected sets, updated in real time. SlideBase is linked to major databases from genomics consortia, including FANTOM, GTEx, The Human Protein Atlas and BioGPS. Database URL: http://slidebase.binf.ku.dk PMID:28025337
TIGER 2010 Boundaries

EPA Pesticide Factsheets

This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://www.epa.gov/enviroatlas). This web service includes the State and County boundaries from the TIGER shapefiles compiled into a single national coverage for each layer. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB).
A High-Resolution In Vivo Atlas of the Human Brain's Serotonin System.

PubMed

Beliveau, Vincent; Ganz, Melanie; Feng, Ling; Ozenne, Brice; Højgaard, Liselotte; Fisher, Patrick M; Svarer, Claus; Greve, Douglas N; Knudsen, Gitte M

2017-01-04

The serotonin (5-hydroxytryptamine, 5-HT) system modulates many important brain functions and is critically involved in many neuropsychiatric disorders. Here, we present a high-resolution, multidimensional, in vivo atlas of four of the human brain's 5-HT receptors (5-HT 1A , 5-HT 1B , 5-HT 2A , and 5-HT 4 ) and the 5-HT transporter (5-HTT). The atlas is created from molecular and structural high-resolution neuroimaging data consisting of positron emission tomography (PET) and magnetic resonance imaging (MRI) scans acquired in a total of 210 healthy individuals. Comparison of the regional PET binding measures with postmortem human brain autoradiography outcomes showed a high correlation for the five 5-HT targets and this enabled us to transform the atlas to represent protein densities (in picomoles per milliliter). We also assessed the regional association between protein concentration and mRNA expression in the human brain by comparing the 5-HT density across the atlas with data from the Allen Human Brain atlas and identified receptor- and transporter-specific associations that show the regional relation between the two measures. Together, these data provide unparalleled insight into the serotonin system of the human brain. We present a high-resolution positron emission tomography (PET)- and magnetic resonance imaging-based human brain atlas of important serotonin receptors and the transporter. The regional PET-derived binding measures correlate strongly with the corresponding autoradiography protein levels. The strong correlation enables the transformation of the PET-derived human brain atlas into a protein density map of the serotonin (5-hydroxytryptamine, 5-HT) system. Next, we compared the regional receptor/transporter protein densities with mRNA levels and uncovered unique associations between protein expression and density at high detail. This new in vivo neuroimaging atlas of the 5-HT system not only provides insight in the human brain's regional protein synthesis, transport, and density, but also represents a valuable source of information for the neuroscience community as a comparative instrument to assess brain disorders. Copyright © 2017 the authors 0270-6474/17/370120-09$15.00/0.
RNA Bricks—a database of RNA 3D motifs and their interactions

PubMed Central

Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

2014-01-01

The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091
Poster — Thur Eve — 59: Atlas Selection for Automated Segmentation of Pelvic CT for Prostate Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mallawi, A; Farrell, T; Diamond, K

2014-08-15

Automated atlas-based segmentation has recently been evaluated for use in planning prostate cancer radiotherapy. In the typical approach, the essential step is the selection of an atlas from a database that best matches the target image. This work proposes an atlas selection strategy and evaluates its impact on the final segmentation accuracy. Prostate length (PL), right femoral head diameter (RFHD), and left femoral head diameter (LFHD) were measured in CT images of 20 patients. Each subject was then taken as the target image to which all remaining 19 images were affinely registered. For each pair of registered images, the overlapmore » between prostate and femoral head contours was quantified using the Dice Similarity Coefficient (DSC). Finally, we designed an atlas selection strategy that computed the ratio of PL (prostate segmentation), RFHD (right femur segmentation), and LFHD (left femur segmentation) between the target subject and each subject in the atlas database. Five atlas subjects yielding ratios nearest to one were then selected for further analysis. RFHD and LFHD were excellent parameters for atlas selection, achieving a mean femoral head DSC of 0.82 ± 0.06. PL had a moderate ability to select the most similar prostate, with a mean DSC of 0.63 ± 0.18. The DSC obtained with the proposed selection method were slightly lower than the maximums established using brute force, but this does not include potential improvements expected with deformable registration. Atlas selection based on PL for prostate and femoral diameter for femoral heads provides reasonable segmentation accuracy.« less
The European Bioinformatics Institute’s data resources 2014

PubMed Central

Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

2014-01-01

Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI’s database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396
The Cardiac Atlas Project--an imaging database for computational modeling and statistical atlases of the heart.

PubMed

Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A

2011-08-15

Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
TIGER 2010 Boundaries

EPA Pesticide Factsheets

This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://www.epa.gov/enviroatlas). This web service includes the State, County, and Census Block Groups boundaries from the TIGER shapefiles compiled into a single national coverage for each layer. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB).
The ATLAS conditions database architecture for the Muon spectrometer

NASA Astrophysics Data System (ADS)

Verducci, Monica; ATLAS Muon Collaboration

2010-04-01

The Muon System, facing the challenge requirement of the conditions data storage, has extensively started to use the conditions database project 'COOL' as the basis for all its conditions data storage both at CERN and throughout the worldwide collaboration as decided by the ATLAS Collaboration. The management of the Muon COOL conditions database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored. The Muon conditions database is responsible for almost all of the 'non event' data and detector quality flags storage needed for debugging of the detector operations and for performing reconstruction and analysis. The COOL database allows database applications to be written independently of the underlying database technology and ensures long term compatibility with the entire ATLAS Software. COOL implements an interval of validity database, i.e. objects stored or referenced in COOL have an associated start and end time between which they are valid, the data is stored in folders, which are themselves arranged in a hierarchical structure of folder sets. The structure is simple and mainly optimized to store and retrieve object(s) associated with a particular time. In this work, an overview of the entire Muon conditions database architecture is given, including the different sources of the data and the storage model used. In addiction the software interfaces used to access to the conditions data are described, more emphasis is given to the Offline Reconstruction framework ATHENA and the services developed to provide the conditions data to the reconstruction.
The PeptideAtlas Project.

PubMed

Deutsch, Eric W

2010-01-01

PeptideAtlas is a multi-species compendium of peptides observed with tandem mass spectrometry methods. Raw mass spectrometer output files are collected from the community and reprocessed through a uniform analysis and validation pipeline that continues to advance. The results are loaded into a database and the information derived from the raw data is returned to the community via several web-based data exploration tools. The PeptideAtlas resource is useful for experiment planning, improving genome annotation, and other data mining projects. PeptideAtlas has become especially useful for planning targeted proteomics experiments.
Atlas of Vega: 3850-6860 Å

NASA Astrophysics Data System (ADS)

Kim, Hyun-Sook; Han, Inwoo; Valyavin, G.; Lee, Byeong-Cheol; Shimansky, V.; Galazutdinov, G. A.

2009-10-01

We present a high resolving power (λ/Δλ = 90,000) and high signal-to-noise ratio (˜700) spectral atlas of Vega covering the 3850-6860 Å wavelength range. The atlas is a result of averaging of spectra recorded with the aid of the echelle spectrograph BOES fed by the 1.8 m telescope at Bohyunsan Observatory (Korea). The atlas is provided only in machine-readable form (electronic data file) and will be available in the SIMBAD database upon publication. Based on data collected with the 1.8 m telescope operated at BOAO Observatory, Korea.
Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites.

PubMed

Ribeiro, António J M; Holliday, Gemma L; Furnham, Nicholas; Tyzack, Jonathan D; Ferris, Katherine; Thornton, Janet M

2018-01-04

M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
PhosphoregDB: The tissue and sub-cellular distribution of mammalian protein kinases and phosphatases

PubMed Central

Forrest, Alistair RR; Taylor, Darrin F; Fink, J Lynn; Gongora, M Milena; Flegg, Cameron; Teasdale, Rohan D; Suzuki, Harukazu; Kanamori, Mutsumi; Kai, Chikatoshi; Hayashizaki, Yoshihide; Grimmond, Sean M

2006-01-01

Background Protein kinases and protein phosphatases are the fundamental components of phosphorylation dependent protein regulatory systems. We have created a database for the protein kinase-like and phosphatase-like loci of mouse that integrates protein sequence, interaction, classification and pathway information with the results of a systematic screen of their sub-cellular localization and tissue specific expression data mined from the GNF tissue atlas of mouse. Results The database lets users query where a specific kinase or phosphatase is expressed at both the tissue and sub-cellular levels. Similarly the interface allows the user to query by tissue, pathway or sub-cellular localization, to reveal which components are co-expressed or co-localized. A review of their expression reveals 30% of these components are detected in all tissues tested while 70% show some level of tissue restriction. Hierarchical clustering of the expression data reveals that expression of these genes can be used to separate the samples into tissues of related lineage, including 3 larger clusters of nervous tissue, developing embryo and cells of the immune system. By overlaying the expression, sub-cellular localization and classification data we examine correlations between class, specificity and tissue restriction and show that tyrosine kinases are more generally expressed in fewer tissues than serine/threonine kinases. Conclusion Together these data demonstrate that cell type specific systems exist to regulate protein phosphorylation and that for accurate modelling and for determination of enzyme substrate relationships the co-location of components needs to be considered. PMID:16504016
Fun Databases: My Top Ten.

ERIC Educational Resources Information Center

O'Leary, Mick

1992-01-01

Provides reviews of 10 online databases: Consumer Reports; Public Opinion Online; Encyclopedia of Associations; Official Airline Guide Adventure Atlas and Events Calendar; CENDATA; Hollywood Hotline; Fearless Taster; Soap Opera Summaries; and Human Sexuality. (LRW)
Plant Reactome: a resource for plant pathways and comparative analysis

PubMed Central

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj

2017-01-01

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469
A Tool for Conditions Tag Management in ATLAS

NASA Astrophysics Data System (ADS)

Sharmazanashvili, A.; Batiashvili, G.; Gvaberidze, G.; Shekriladze, L.; Formica, A.; Atlas Collaboration

2014-06-01

ATLAS Conditions data include about 2 TB in a relational database and 400 GB of files referenced from the database. Conditions data is entered and retrieved using COOL, the API for accessing data in the LCG Conditions Database infrastructure. It is managed using an ATLAS-customized python based tool set. Conditions data are required for every reconstruction and simulation job, so access to them is crucial for all aspects of ATLAS data taking and analysis, as well as by preceding tasks to derive optimal corrections to reconstruction. Optimized sets of conditions for processing are accomplished using strict version control on those conditions: a process which assigns COOL Tags to sets of conditions, and then unifies those conditions over data-taking intervals into a COOL Global Tag. This Global Tag identifies the set of conditions used to process data so that the underlying conditions can be uniquely identified with 100% reproducibility should the processing be executed again. Understanding shifts in the underlying conditions from one tag to another and ensuring interval completeness for all detectors for a set of runs to be processed is a complex task, requiring tools beyond the above mentioned python utilities. Therefore, a JavaScript /PHP based utility called the Conditions Tag Browser (CTB) has been developed. CTB gives detector and conditions experts the possibility to navigate through the different databases and COOL folders; explore the content of given tags and the differences between them, as well as their extent in time; visualize the content of channels associated with leaf tags. This report describes the structure and PHP/ JavaScript classes of functions of the CTB.

AtlasCBS: a web server to map and explore chemico-biological space

NASA Astrophysics Data System (ADS)

Cortés-Cabrera, Álvaro; Morreale, Antonio; Gago, Federico; Abad-Zapatero, Celerino

2012-09-01

New approaches are needed that can help decrease the unsustainable failure in small-molecule drug discovery. Ligand Efficiency Indices (LEI) are making a great impact on early-stage compound selection and prioritization. Given a target-ligand database with chemical structures and associated biological affinities/activities for a target, the AtlasCBS server generates two-dimensional, dynamical representations of its contents in terms of LEI. These variables allow an effective decoupling of the chemical (angular) and biological (radial) components. BindingDB, PDBBind and ChEMBL databases are currently implemented. Proprietary datasets can also be uploaded and compared. The utility of this atlas-like representation in the future of drug design is highlighted with some examples. The web server can be accessed at http://ub.cbm.uam.es/atlascbs and https://www.ebi.ac.uk/chembl/atlascbs.
AtlasCBS: a web server to map and explore chemico-biological space.

PubMed

Cortés-Cabrera, Alvaro; Morreale, Antonio; Gago, Federico; Abad-Zapatero, Celerino

2012-09-01

New approaches are needed that can help decrease the unsustainable failure in small-molecule drug discovery. Ligand Efficiency Indices (LEI) are making a great impact on early-stage compound selection and prioritization. Given a target-ligand database with chemical structures and associated biological affinities/activities for a target, the AtlasCBS server generates two-dimensional, dynamical representations of its contents in terms of LEI. These variables allow an effective decoupling of the chemical (angular) and biological (radial) components. BindingDB, PDBBind and ChEMBL databases are currently implemented. Proprietary datasets can also be uploaded and compared. The utility of this atlas-like representation in the future of drug design is highlighted with some examples. The web server can be accessed at http://ub.cbm.uam.es/atlascbs and https://www.ebi.ac.uk/chembl/atlascbs.
Automatic Structural Parcellation of Mouse Brain MRI Using Multi-Atlas Label Fusion

PubMed Central

Ma, Da; Cardoso, Manuel J.; Modat, Marc; Powell, Nick; Wells, Jack; Holmes, Holly; Wiseman, Frances; Tybulewicz, Victor; Fisher, Elizabeth; Lythgoe, Mark F.; Ourselin, Sébastien

2014-01-01

Multi-atlas segmentation propagation has evolved quickly in recent years, becoming a state-of-the-art methodology for automatic parcellation of structural images. However, few studies have applied these methods to preclinical research. In this study, we present a fully automatic framework for mouse brain MRI structural parcellation using multi-atlas segmentation propagation. The framework adopts the similarity and truth estimation for propagated segmentations (STEPS) algorithm, which utilises a locally normalised cross correlation similarity metric for atlas selection and an extended simultaneous truth and performance level estimation (STAPLE) framework for multi-label fusion. The segmentation accuracy of the multi-atlas framework was evaluated using publicly available mouse brain atlas databases with pre-segmented manually labelled anatomical structures as the gold standard, and optimised parameters were obtained for the STEPS algorithm in the label fusion to achieve the best segmentation accuracy. We showed that our multi-atlas framework resulted in significantly higher segmentation accuracy compared to single-atlas based segmentation, as well as to the original STAPLE framework. PMID:24475148
Regulatory interactions between long noncoding RNA LINC00968 and miR-9-3p in non-small cell lung cancer: A bioinformatic analysis based on miRNA microarray, GEO and TCGA.

PubMed

Li, Dong-Yao; Chen, Wen-Jie; Shang, Jun; Chen, Gang; Li, Shi-Kang

2018-06-01

Long non-coding RNAs (lncRNAs) have been demonstrated to mediate carcinogenesis in various types of cancer. However, the regulatory role of lncRNA LINC00968 in lung adenocarcinoma remains unclear. The microRNA (miRNA) expression in LINC00968-overexpressing human lung adenocarcinoma A549 cells was detected using miRNA microarray analysis. miR-9-3p was selected for further analysis, and its expression was verified in the Gene Expression Omnibus (GEO) database. In addition, the regulatory axis of LINC00968 was validated using The Cancer Genome Atlas (TCGA) database. Results of the GEO database indicated miR-9-3p expression in lung adenocarcinoma was significantly higher compared with normal tissues. Functional enrichment analyses of the target genes of miR-9-3p indicated protein binding and the AMP-activated protein kinase pathway were the most enriched Gene Ontology and KEGG terms, respectively. Combining target genes with the correlated genes of LINC00968 and miR-9-3p, 120 objective genes were obtained, which were used to construct a protein-protein interaction (PPI) network. Cyclin A2 (CCNA2) was identified to have a vital role in the PPI network. Significant correlations were detected between LINC00968, miR-9-3p and CCNA2 in lung adenocarcinoma. The LINC00968/miR-9-3p/CCNA2 regulatory axis provides a new foundation for further evaluating the regulatory mechanisms of LINC00968 in lung adenocarcinoma.
Processing and Quality Monitoring for the ATLAS Tile Hadronic Calorimeter Data

NASA Astrophysics Data System (ADS)

Burghgrave, Blake; ATLAS Collaboration

2017-10-01

An overview is presented of Data Processing and Data Quality (DQ) Monitoring for the ATLAS Tile Hadronic Calorimeter. Calibration runs are monitored from a data quality perspective and used as a cross-check for physics runs. Data quality in physics runs is monitored extensively and continuously. Any problems are reported and immediately investigated. The DQ efficiency achieved was 99.6% in 2012 and 100% in 2015, after the detector maintenance in 2013-2014. Changes to detector status or calibrations are entered into the conditions database (DB) during a brief calibration loop between the end of a run and the beginning of bulk processing of data collected in it. Bulk processed data are reviewed and certified for the ATLAS Good Run List if no problem is detected. Experts maintain the tools used by DQ shifters and the calibration teams during normal operation, and prepare new conditions for data reprocessing and Monte Carlo (MC) production campaigns. Conditions data are stored in 3 databases: Online DB, Offline DB for data and a special DB for Monte Carlo. Database updates can be performed through a custom-made web interface.
FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster

PubMed Central

Robinson, Scott W.; Herzyk, Pawel; Dow, Julian A. T.; Leader, David P.

2013-01-01

The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25—17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13 250 Drosophila genes, detecting 12 533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax ‘autosuggest’ facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues. PMID:23203866
FlyAtlas: database of gene expression in the tissues of Drosophila melanogaster.

PubMed

Robinson, Scott W; Herzyk, Pawel; Dow, Julian A T; Leader, David P

2013-01-01

The FlyAtlas resource contains data on the expression of the genes of Drosophila melanogaster in different tissues (currently 25-17 adult and 8 larval) obtained by hybridization of messenger RNA to Affymetrix Drosophila Genome 2 microarrays. The microarray probe sets cover 13,250 Drosophila genes, detecting 12,533 in an unambiguous manner. The data underlying the original web application (http://flyatlas.org) have been restructured into a relational database and a Java servlet written to provide a new web interface, FlyAtlas 2 (http://flyatlas.gla.ac.uk/), which allows several additional queries. Users can retrieve data for individual genes or for groups of genes belonging to the same or related ontological categories. Assistance in selecting valid search terms is provided by an Ajax 'autosuggest' facility that polls the database as the user types. Searches can also focus on particular tissues, and data can be retrieved for the most highly expressed genes, for genes of a particular category with above-average expression or for genes with the greatest difference in expression between the larval and adult stages. A novel facility allows the database to be queried with a specific gene to find other genes with a similar pattern of expression across the different tissues.
Evolution of the architecture of the ATLAS Metadata Interface (AMI)

NASA Astrophysics Data System (ADS)

Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.

2015-12-01

The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.
Atlas of Iberian water beetles (ESACIB database).

PubMed

Sánchez-Fernández, David; Millán, Andrés; Abellán, Pedro; Picazo, Félix; Carbonell, José A; Ribera, Ignacio

2015-01-01

The ESACIB ('EScarabajos ACuáticos IBéricos') database is provided, including all available distributional data of Iberian and Balearic water beetles from the literature up to 2013, as well as from museum and private collections, PhD theses, and other unpublished sources. The database contains 62,015 records with associated geographic data (10×10 km UTM squares) for 488 species and subspecies of water beetles, 120 of them endemic to the Iberian Peninsula and eight to the Balearic Islands. This database was used for the elaboration of the "Atlas de los Coleópteros Acuáticos de España Peninsular". In this dataset data of 15 additional species has been added: 11 that occur in the Balearic Islands or mainland Portugal but not in peninsular Spain and an other four with mainly terrestrial habits within the genus Helophorus (for taxonomic coherence). The complete dataset is provided in Darwin Core Archive format.
Atlas of Iberian water beetles (ESACIB database)

PubMed Central

Sánchez-Fernández, David; Millán, Andrés; Abellán, Pedro; Picazo, Félix; Carbonell, José A.; Ribera, Ignacio

2015-01-01

Abstract The ESACIB (‘EScarabajos ACuáticos IBéricos’) database is provided, including all available distributional data of Iberian and Balearic water beetles from the literature up to 2013, as well as from museum and private collections, PhD theses, and other unpublished sources. The database contains 62,015 records with associated geographic data (10×10 km UTM squares) for 488 species and subspecies of water beetles, 120 of them endemic to the Iberian Peninsula and eight to the Balearic Islands. This database was used for the elaboration of the “Atlas de los Coleópteros Acuáticos de España Peninsular”. In this dataset data of 15 additional species has been added: 11 that occur in the Balearic Islands or mainland Portugal but not in peninsular Spain and an other four with mainly terrestrial habits within the genus Helophorus (for taxonomic coherence). The complete dataset is provided in Darwin Core Archive format. PMID:26448717
CORAL Server and CORAL Server Proxy: Scalable Access to Relational Databases from CORAL Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valassi, A.; /CERN; Bartoldus, R.

The CORAL software is widely used at CERN by the LHC experiments to access the data they store on relational databases, such as Oracle. Two new components have recently been added to implement a model involving a middle tier 'CORAL server' deployed close to the database and a tree of 'CORAL server proxies', providing data caching and multiplexing, deployed close to the client. A first implementation of the two new components, released in the summer 2009, is now deployed in the ATLAS online system to read the data needed by the High Level Trigger, allowing the configuration of a farmmore » of several thousand processes. This paper reviews the architecture of the software, its development status and its usage in ATLAS.« less
Identification and prognostic value of anterior gradient protein 2 expression in breast cancer based on tissue microarray.

PubMed

Guo, Jilong; Gong, Guohua; Zhang, Bin

2017-07-01

Breast cancer has attracted substantial attention as one of the major cancers causing death in women. It is crucial to find potential biomarkers of prognostic value in breast cancer. In this study, the expression pattern of anterior gradient protein 2 in breast cancer was identified based on the main molecular subgroups. Through analysis of 69 samples from the Gene Expression Omnibus database, we found that anterior gradient protein 2 expression was significantly higher in non-triple-negative breast cancer tissues compared with normal tissues and triple-negative breast cancer tissues (p < 0.05). The data from a total of 622 patients from The Cancer Genome Atlas were analysed. The data from The Cancer Genome Atlas and results from quantitative reverse transcription polymerase chain reaction also verified the anterior gradient protein 2 expression pattern. Furthermore, we performed immunohistochemical analysis. The quantification results revealed that anterior gradient protein 2 is highly expressed in non-triple-negative breast cancer (grade 3 excluded) and grade 1 + 2 (triple-negative breast cancer excluded) tumours compared with normal tissues. Anterior gradient protein 2 was significantly highly expressed in non-triple-negative breast cancer (grade 3 excluded) and non-triple-negative breast cancer tissues compared with triple-negative breast cancer tissues (p < 0.01). In addition, anterior gradient protein 2 was significantly highly expressed in grade 1 + 2 (triple-negative breast cancer excluded) and grade 1 + 2 tissues compared with grade 3 tissues (p < 0.05). Analysis by Fisher's exact test revealed that anterior gradient protein 2 expression was significantly associated with histologic type, histological grade, oestrogen status and progesterone status. Univariate analysis of clinicopathological variables showed that anterior gradient protein 2 expression, tumour size and lymph node status were significantly correlated with overall survival in patients with grade 1 and 2 tumours. Cox multivariate analysis revealed anterior gradient protein 2 as a putative independent indicator of unfavourable outcomes (p = 0.031). All these data clearly showed that anterior gradient protein 2 is highly expressed in breast cancer and can be regarded as a putative biomarker for breast cancer prognosis.
Plant Reactome: a resource for plant pathways and comparative analysis.

PubMed

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj

2017-01-04

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
TP Atlas: integration and dissemination of advances in Targeted Proteins Research Program (TPRP)-structural biology project phase II in Japan.

PubMed

Iwayanagi, Takao; Miyamoto, Sei; Konno, Takeshi; Mizutani, Hisashi; Hirai, Tomohiro; Shigemoto, Yasumasa; Gojobori, Takashi; Sugawara, Hideaki

2012-09-01

The Targeted Proteins Research Program (TPRP) promoted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan is the phase II of structural biology project (2007-2011) following the Protein 3000 Project (2002-2006) in Japan. While the phase I Protein 3000 Project put partial emphasis on the construction and maintenance of pipelines for structural analyses, the TPRP is dedicated to revealing the structures and functions of the targeted proteins that have great importance in both basic research and industrial applications. To pursue this objective, 35 Targeted Proteins (TP) Projects selected in the three areas of fundamental biology, medicine and pharmacology, and food and environment are tightly collaborated with 10 Advanced Technology (AT) Projects in the four fields of protein production, structural analyses, chemical library and screening, and information platform. Here, the outlines and achievements of the 35 TP Projects are summarized in the system named TP Atlas. Progress in the diversified areas is described in the modules of Graphical Summary, General Summary, Tabular Summary, and Structure Gallery of the TP Atlas in the standard and unified format. Advances in TP Projects owing to novel technologies stemmed from AT Projects and collaborative research among TP Projects are illustrated as a hallmark of the Program. The TP Atlas can be accessed at http://net.genes.nig.ac.jp/tpatlas/index_e.html .
Verification of ICESat-2/ATLAS Science Receiver Algorithm Onboard Databases

NASA Astrophysics Data System (ADS)

Carabajal, C. C.; Saba, J. L.; Leigh, H. W.; Magruder, L. A.; Urban, T. J.; Mcgarry, J.; Schutz, B. E.

2013-12-01

NASA's ICESat-2 mission will fly the Advanced Topographic Laser Altimetry System (ATLAS) instrument on a 3-year mission scheduled to launch in 2016. ATLAS is a single-photon detection system transmitting at 532nm with a laser repetition rate of 10 kHz, and a 6 spot pattern on the Earth's surface. A set of onboard Receiver Algorithms will perform signal processing to reduce the data rate and data volume to acceptable levels. These Algorithms distinguish surface echoes from the background noise, limit the daily data volume, and allow the instrument to telemeter only a small vertical region about the signal. For this purpose, three onboard databases are used: a Surface Reference Map (SRM), a Digital Elevation Model (DEM), and a Digital Relief Maps (DRMs). The DEM provides minimum and maximum heights that limit the signal search region of the onboard algorithms, including a margin for errors in the source databases, and onboard geolocation. Since the surface echoes will be correlated while noise will be randomly distributed, the signal location is found by histogramming the received event times and identifying the histogram bins with statistically significant counts. Once the signal location has been established, the onboard Digital Relief Maps (DRMs) will be used to determine the vertical width of the telemetry band about the signal. University of Texas-Center for Space Research (UT-CSR) is developing the ICESat-2 onboard databases, which are currently being tested using preliminary versions and equivalent representations of elevation ranges and relief more recently developed at Goddard Space Flight Center (GSFC). Global and regional elevation models have been assessed in terms of their accuracy using ICESat geodetic control, and have been used to develop equivalent representations of the onboard databases for testing against the UT-CSR databases, with special emphasis on the ice sheet regions. A series of verification checks have been implemented, including comparisons against ICESat altimetry for selected regions with tall vegetation and high relief. The extensive verification effort by the Receiver Algorithm team at GSFC is aimed at assuring that the onboard databases are sufficiently accurate. We will present the results of those assessments and verification tests, along with measures taken to implement modifications to the databases to optimize their use by the receiver algorithms. Companion presentations by McGarry et al. and Leigh et al. describe the details on the ATLAS Onboard Receiver Algorithms and databases development, respectively.
DISTRIBUTED CONTROL AND DA FOR ATLAS

DOE Office of Scientific and Technical Information (OSTI.GOV)

D. SCUDDER; ET AL

1999-05-01

The control system for the Atlas pulsed power generator being built at Los Alamos National Laboratory will utilize a significant level of distributed control. Other principal design characteristics include noise immunity, modularity and use of commercial products wherever possible. The data acquisition system is tightly coordinated with the control system. Both share a common database server and a fiber-optic ethernet communications backbone.
A Four-Dimensional Probabilistic Atlas of the Human Brain

PubMed Central

Mazziotta, John; Toga, Arthur; Evans, Alan; Fox, Peter; Lancaster, Jack; Zilles, Karl; Woods, Roger; Paus, Tomas; Simpson, Gregory; Pike, Bruce; Holmes, Colin; Collins, Louis; Thompson, Paul; MacDonald, David; Iacoboni, Marco; Schormann, Thorsten; Amunts, Katrin; Palomero-Gallagher, Nicola; Geyer, Stefan; Parsons, Larry; Narr, Katherine; Kabani, Noor; Le Goualher, Georges; Feidler, Jordan; Smith, Kenneth; Boomsma, Dorret; Pol, Hilleke Hulshoff; Cannon, Tyrone; Kawashima, Ryuta; Mazoyer, Bernard

2001-01-01

The authors describe the development of a four-dimensional atlas and reference system that includes both macroscopic and microscopic information on structure and function of the human brain in persons between the ages of 18 and 90 years. Given the presumed large but previously unquantified degree of structural and functional variance among normal persons in the human population, the basis for this atlas and reference system is probabilistic. Through the efforts of the International Consortium for Brain Mapping (ICBM), 7,000 subjects will be included in the initial phase of database and atlas development. For each subject, detailed demographic, clinical, behavioral, and imaging information is being collected. In addition, 5,800 subjects will contribute DNA for the purpose of determining genotype– phenotype–behavioral correlations. The process of developing the strategies, algorithms, data collection methods, validation approaches, database structures, and distribution of results is described in this report. Examples of applications of the approach are described for the normal brain in both adults and children as well as in patients with schizophrenia. This project should provide new insights into the relationship between microscopic and macroscopic structure and function in the human brain and should have important implications in basic neuroscience, clinical diagnostics, and cerebral disorders. PMID:11522763
TissueWikiMobile: an Integrative Protein Expression Image Browser for Pathological Knowledge Sharing and Annotation on a Mobile Device

PubMed Central

Cheng, Chihwen; Stokes, Todd H.; Hang, Sovandy; Wang, May D.

2016-01-01

Doctors need fast and convenient access to medical data. This motivates the use of mobile devices for knowledge retrieval and sharing. We have developed TissueWikiMobile on the Apple iPhone and iPad to seamlessly access TissueWiki, an enormous repository of medical histology images. TissueWiki is a three terabyte database of antibody information and histology images from the Human Protein Atlas (HPA). Using TissueWikiMobile, users are capable of extracting knowledge from protein expression, adding annotations to highlight regions of interest on images, and sharing their professional insight. By providing an intuitive human computer interface, users can efficiently operate TissueWikiMobile to access important biomedical data without losing mobility. TissueWikiMobile furnishes the health community a ubiquitous way to collaborate and share their expert opinions not only on the performance of various antibodies stains but also on histology image annotation. PMID:27532057
Development of the Health Atlas of Jalisco: A New Web-Based Service for the Ministry of Health and the Community in Mexico

PubMed Central

Robles, Juan; Fonseca León, Joel

2016-01-01

Background Maps have been widely used to provide a visual representation of information of a geographic area. Health atlases are collections of maps related to conditions, infrastructure or services provided. Various countries have put resources towards producing health atlases that support health decision makers to enhance their services to the communities. Latin America, as well as Spain, have produced several atlases of importance such as the interactive mortality atlas of Andalucía, which is very similar to the one that is presented in this paper. In Mexico, the National Institute of Public Health produced the only health atlas found that is of relevance. It was published online in 2003 and is currently still active. Objective The objective of this work is to describe the methods used to develop the Health Atlas of Jalisco (HAJ), and show its characteristics and how it interactively works with the user as a Web-based service. Methods This work has an ecological design in which the analysis units are the 125 municipalities (counties) of the state of Jalisco, Mexico. We created and published online a geographic health atlas displaying a system based on input from official health database of the Health Ministry of Jalisco (HMJ), and some databases from the National Institute of Statistics and Geography (NISGI). The atlas displays 256 different variables as health-direct or health-related indicators. Instant Atlas software was used to generate the online application. The atlas was developed using these procedures: (1) datasheet processing and base maps generation, (2) software arrangements, and (3) website creation. Results The HAJ is a Web-based service that allows users to interact with health and general data, regions, and categories according to their information needs and generates thematic maps (eg, the total population of the state or of a single municipality grouped by age or sex). The atlas is capable of displaying more than 32,000 different maps by combining categories, indicators, municipalities, and regions. Users can select the entire province, one or several municipalities, and the indicator they require. The atlas then generates and displays the requested map. Conclusions This atlas is a Web-based service that interactively allows users to review health indicators such as structure, supplies, processes, and the impact on public health and related sectors in Jalisco, Mexico. One of the main interests is to reduce the number of information requests that the Ministry of Health receives every week from the general public, media reporters, and other government sectors. The atlas will support transparency, information diffusion, health decision-making, and the formulation of new public policies. Furthermore, the research team intends to promote research and education in public health. PMID:27227146
Development of the Health Atlas of Jalisco: A New Web-Based Service for the Ministry of Health and the Community in Mexico.

PubMed

Ramos Herrera, Igor Martin; Gonzalez Castañeda, Miguel; Robles, Juan; Fonseca León, Joel

2016-01-01

Maps have been widely used to provide a visual representation of information of a geographic area. Health atlases are collections of maps related to conditions, infrastructure or services provided. Various countries have put resources towards producing health atlases that support health decision makers to enhance their services to the communities. Latin America, as well as Spain, have produced several atlases of importance such as the interactive mortality atlas of Andalucía, which is very similar to the one that is presented in this paper. In Mexico, the National Institute of Public Health produced the only health atlas found that is of relevance. It was published online in 2003 and is currently still active. The objective of this work is to describe the methods used to develop the Health Atlas of Jalisco (HAJ), and show its characteristics and how it interactively works with the user as a Web-based service. This work has an ecological design in which the analysis units are the 125 municipalities (counties) of the state of Jalisco, Mexico. We created and published online a geographic health atlas displaying a system based on input from official health database of the Health Ministry of Jalisco (HMJ), and some databases from the National Institute of Statistics and Geography (NISGI). The atlas displays 256 different variables as health-direct or health-related indicators. Instant Atlas software was used to generate the online application. The atlas was developed using these procedures: (1) datasheet processing and base maps generation, (2) software arrangements, and (3) website creation. The HAJ is a Web-based service that allows users to interact with health and general data, regions, and categories according to their information needs and generates thematic maps (eg, the total population of the state or of a single municipality grouped by age or sex). The atlas is capable of displaying more than 32,000 different maps by combining categories, indicators, municipalities, and regions. Users can select the entire province, one or several municipalities, and the indicator they require. The atlas then generates and displays the requested map. This atlas is a Web-based service that interactively allows users to review health indicators such as structure, supplies, processes, and the impact on public health and related sectors in Jalisco, Mexico. One of the main interests is to reduce the number of information requests that the Ministry of Health receives every week from the general public, media reporters, and other government sectors. The atlas will support transparency, information diffusion, health decision-making, and the formulation of new public policies. Furthermore, the research team intends to promote research and education in public health.

User’s guide and metada for the PICES Nonindigenous Species Information System

USGS Publications Warehouse

Lee,; Reusser, Deborah A.; Marko,; Ranelletti,

2012-01-01

The overall goal of both the database and Atlas was to simplify and standardize the dissemination of distributional, habitat, and life history characteristics of near-coastal and estuarine nonindigenous species. This database provides a means of querying these data and displaying the information in a consistent format. The specific classes of information the database captures include: Regional and global ranges of native and nonindigenous near-coastal and estuarine species at different hierarchical spatial scales. Habitat and physiological requirements of near-coastal and estuarine species. Life history characteristics of near-coastal and estuarine species. Invasion history and vectors for nonindigenous species. This standardized and synthesized data in the database and the Atlas provide the basic information needed to address a number of managerial and scientific needs. Thus, users will be able to: Create a baseline on the extent of invasion by region in order to assess new invasions. Use existing geographical patterns of invasion to gain some insights into potential new invaders. Use existing geographical patters of invasion to gain some insights into mechanisms affecting relative invasibility of different areas. Use life history attributes and environmental requirements of the reported nonindigenous species to evaluate traits of invaders. Understand the potential spread of invaders based on their habitat and environmental requirements. Understand importance of different vectors of introduction of nonindigenous species by region. The data in the Atlas of Nonindigenous Marine and Estuarine Species in the North Pacific (Lee and Reusser, 2012) are up-to-date as of June 2012. Updates to the PICES database were made in September 2012.
An anatomic transcriptional atlas of human glioblastoma.

PubMed

Puchalski, Ralph B; Shah, Nameeta; Miller, Jeremy; Dalley, Rachel; Nomura, Steve R; Yoon, Jae-Guen; Smith, Kimberly A; Lankerovich, Michael; Bertagnolli, Darren; Bickley, Kris; Boe, Andrew F; Brouner, Krissy; Butler, Stephanie; Caldejon, Shiella; Chapin, Mike; Datta, Suvro; Dee, Nick; Desta, Tsega; Dolbeare, Tim; Dotson, Nadezhda; Ebbert, Amanda; Feng, David; Feng, Xu; Fisher, Michael; Gee, Garrett; Goldy, Jeff; Gourley, Lindsey; Gregor, Benjamin W; Gu, Guangyu; Hejazinia, Nika; Hohmann, John; Hothi, Parvinder; Howard, Robert; Joines, Kevin; Kriedberg, Ali; Kuan, Leonard; Lau, Chris; Lee, Felix; Lee, Hwahyung; Lemon, Tracy; Long, Fuhui; Mastan, Naveed; Mott, Erika; Murthy, Chantal; Ngo, Kiet; Olson, Eric; Reding, Melissa; Riley, Zack; Rosen, David; Sandman, David; Shapovalova, Nadiya; Slaughterbeck, Clifford R; Sodt, Andrew; Stockdale, Graham; Szafer, Aaron; Wakeman, Wayne; Wohnoutka, Paul E; White, Steven J; Marsh, Don; Rostomily, Robert C; Ng, Lydia; Dang, Chinh; Jones, Allan; Keogh, Bart; Gittleman, Haley R; Barnholtz-Sloan, Jill S; Cimino, Patrick J; Uppin, Megha S; Keene, C Dirk; Farrokhi, Farrokh R; Lathia, Justin D; Berens, Michael E; Iavarone, Antonio; Bernard, Amy; Lein, Ed; Phillips, John W; Rostad, Steven W; Cobbs, Charles; Hawrylycz, Michael J; Foltz, Greg D

2018-05-11

Glioblastoma is an aggressive brain tumor that carries a poor prognosis. The tumor's molecular and cellular landscapes are complex, and their relationships to histologic features routinely used for diagnosis are unclear. We present the Ivy Glioblastoma Atlas, an anatomically based transcriptional atlas of human glioblastoma that aligns individual histologic features with genomic alterations and gene expression patterns, thus assigning molecular information to the most important morphologic hallmarks of the tumor. The atlas and its clinical and genomic database are freely accessible online data resources that will serve as a valuable platform for future investigations of glioblastoma pathogenesis, diagnosis, and treatment. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
NASA Technical Interchange Meeting (TIM): Advanced Technology Lifecycle Analysis System (ATLAS) Technology Tool Box

NASA Technical Reports Server (NTRS)

ONeil, D. A.; Craig, D. A.; Christensen, C. B.; Gresham, E. C.

2005-01-01

The objective of this Technical Interchange Meeting was to increase the quantity and quality of technical, cost, and programmatic data used to model the impact of investing in different technologies. The focus of this meeting was the Technology Tool Box (TTB), a database of performance, operations, and programmatic parameters provided by technologists and used by systems engineers. The TTB is the data repository used by a system of models known as the Advanced Technology Lifecycle Analysis System (ATLAS). This report describes the result of the November meeting, and also provides background information on ATLAS and the TTB.
Integration of the Eventlndex with other ATLAS systems

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Gallas, E. J.; Prokoshin, F.

2015-12-01

The ATLAS EventIndex System, developed for use in LHC Run 2, is designed to index every processed event in ATLAS, replacing the TAG System used in Run 1. Its storage infrastructure, based on Hadoop open-source software framework, necessitates revamping how information in this system relates to other ATLAS systems. It will store more indexes since the fundamental mechanisms for retrieving these indexes will be better integrated into all stages of data processing, allowing more events from later stages of processing to be indexed than was possible with the previous system. Connections with other systems (conditions database, monitoring) are fundamentally critical to assess dataset completeness, identify data duplication, and check data integrity, and also enhance access to information in EventIndex by user and system interfaces. This paper gives an overview of the ATLAS systems involved, the relevant metadata, and describe the technologies we are deploying to complete these connections.
Data mining and visualization of average images in a digital hand atlas

NASA Astrophysics Data System (ADS)

Zhang, Aifeng; Gertych, Arkadiusz; Liu, Brent J.; Huang, H. K.

2005-04-01

We have collected a digital hand atlas containing digitized left hand radiographs of normally developed children grouped accordingly by age, sex, and race. A set of features stored in a database reflecting patient's stage of skeletal development has been calculated by automatic image processing procedures. This paper addresses a new concept, "average" image in the digital hand atlas. The "average" reference image in the digital atlas is selected for each of the groups of normal developed children with the best representative skeletal maturity based on bony features. A data mining procedure was designed and applied to find the average image through average feature vector matching. It also provides a temporary solution for the missing feature problem through polynomial regression. As more cases are added to the digital hand atlas, it can grow to provide clinicians accurate reference images to aid the bone age assessment process.
The MPI-Mainz UV/VIS Spectral Atlas of Gaseous Molecules of Atmospheric Interest

NASA Astrophysics Data System (ADS)

Keller-Rudek, H.; Moortgat, G. K.; Sander, R.; Sörensen, R.

2013-08-01

We present the MPI-Mainz UV/VIS Spectral Atlas, which is a large collection of absorption cross sections and quantum yields in the ultraviolet and visible (UV/VIS) wavelength region for gaseous molecules and radicals primarily of atmospheric interest. The data files contain results of individual measurements, covering research of almost a whole century. To compare and visualize the data sets, multicoloured graphical representations have been created. The Spectral Atlas is available on the internet at http://www.uv-vis-spectral-atlas-mainz.org. It now appears with improved browse and search options, based on new database software. In addition to the web pages, which are continuously updated, a frozen version of the data is available under the doi:10.5281/zenodo.6951.
The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture.

PubMed

Fan, Lingzhong; Li, Hai; Zhuo, Junjie; Zhang, Yu; Wang, Jiaojian; Chen, Liangfu; Yang, Zhengyi; Chu, Congying; Xie, Sangma; Laird, Angela R; Fox, Peter T; Eickhoff, Simon B; Yu, Chunshui; Jiang, Tianzi

2016-08-01

The human brain atlases that allow correlating brain anatomy with psychological and cognitive functions are in transition from ex vivo histology-based printed atlases to digital brain maps providing multimodal in vivo information. Many current human brain atlases cover only specific structures, lack fine-grained parcellations, and fail to provide functionally important connectivity information. Using noninvasive multimodal neuroimaging techniques, we designed a connectivity-based parcellation framework that identifies the subdivisions of the entire human brain, revealing the in vivo connectivity architecture. The resulting human Brainnetome Atlas, with 210 cortical and 36 subcortical subregions, provides a fine-grained, cross-validated atlas and contains information on both anatomical and functional connections. Additionally, we further mapped the delineated structures to mental processes by reference to the BrainMap database. It thus provides an objective and stable starting point from which to explore the complex relationships between structure, connectivity, and function, and eventually improves understanding of how the human brain works. The human Brainnetome Atlas will be made freely available for download at http://atlas.brainnetome.org, so that whole brain parcellations, connections, and functional data will be readily available for researchers to use in their investigations into healthy and pathological states. © The Author 2016. Published by Oxford University Press.
Matching spatial with ontological brain regions using Java tools for visualization, database access, and integrated data analysis.

PubMed

Bezgin, Gleb; Reid, Andrew T; Schubert, Dirk; Kötter, Rolf

2009-01-01

Brain atlases are widely used in experimental neuroscience as tools for locating and targeting specific brain structures. Delineated structures in a given atlas, however, are often difficult to interpret and to interface with database systems that supply additional information using hierarchically organized vocabularies (ontologies). Here we discuss the concept of volume-to-ontology mapping in the context of macroscopical brain structures. We present Java tools with which we have implemented this concept for retrieval of mapping and connectivity data on the macaque brain from the CoCoMac database in connection with an electronic version of "The Rhesus Monkey Brain in Stereotaxic Coordinates" authored by George Paxinos and colleagues. The software, including our manually drawn monkey brain template, can be downloaded freely under the GNU General Public License. It adds value to the printed atlas and has a wider (neuro-)informatics application since it can read appropriately annotated data from delineated sections of other species and organs, and turn them into 3D registered stacks. The tools provide additional features, including visualization and analysis of connectivity data, volume and centre-of-mass estimates, and graphical manipulation of entire structures, which are potentially useful for a range of research and teaching applications.
--No Title--

Science.gov Websites

interoperability emerging infrastructure for data management on computational grids Software Packages Services : ATLAS: Management and Steering: Computing Management Board Software Project Management Board Database Model Group Computing TDR: 4.5 Event Data 4.8 Database and Data Management Services 6.3.4 Production and
Many local pattern texture features: which is better for image-based multilabel human protein subcellular localization classification?

PubMed

Yang, Fan; Xu, Ying-Ying; Shen, Hong-Bin

2014-01-01

Human protein subcellular location prediction can provide critical knowledge for understanding a protein's function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification.
A Community Standard Format for the Representation of Protein Affinity Reagents*

PubMed Central

Gloriam, David E.; Orchard, Sandra; Bertinetti, Daniela; Björling, Erik; Bongcam-Rudloff, Erik; Borrebaeck, Carl A. K.; Bourbeillon, Julie; Bradbury, Andrew R. M.; de Daruvar, Antoine; Dübel, Stefan; Frank, Ronald; Gibson, Toby J.; Gold, Larry; Haslam, Niall; Herberg, Friedrich W.; Hiltke, Tara; Hoheisel, Jörg D.; Kerrien, Samuel; Koegl, Manfred; Konthur, Zoltán; Korn, Bernhard; Landegren, Ulf; Montecchi-Palazzi, Luisa; Palcy, Sandrine; Rodriguez, Henry; Schweinsberg, Sonja; Sievert, Volker; Stoevesandt, Oda; Taussig, Michael J.; Ueffing, Marius; Uhlén, Mathias; van der Maarel, Silvère; Wingren, Christer; Woollard, Peter; Sherman, David J.; Hermjakob, Henning

2010-01-01

Protein affinity reagents (PARs), most commonly antibodies, are essential reagents for protein characterization in basic research, biotechnology, and diagnostics as well as the fastest growing class of therapeutics. Large numbers of PARs are available commercially; however, their quality is often uncertain. In addition, currently available PARs cover only a fraction of the human proteome, and their cost is prohibitive for proteome scale applications. This situation has triggered several initiatives involving large scale generation and validation of antibodies, for example the Swedish Human Protein Atlas and the German Antibody Factory. Antibodies targeting specific subproteomes are being pursued by members of Human Proteome Organisation (plasma and liver proteome projects) and the United States National Cancer Institute (cancer-associated antigens). ProteomeBinders, a European consortium, aims to set up a resource of consistently quality-controlled protein-binding reagents for the whole human proteome. An ultimate PAR database resource would allow consumers to visit one on-line warehouse and find all available affinity reagents from different providers together with documentation that facilitates easy comparison of their cost and quality. However, in contrast to, for example, nucleotide databases among which data are synchronized between the major data providers, current PAR producers, quality control centers, and commercial companies all use incompatible formats, hindering data exchange. Here we propose Proteomics Standards Initiative (PSI)-PAR as a global community standard format for the representation and exchange of protein affinity reagent data. The PSI-PAR format is maintained by the Human Proteome Organisation PSI and was developed within the context of ProteomeBinders by building on a mature proteomics standard format, PSI-molecular interaction, which is a widely accepted and established community standard for molecular interaction data. Further information and documentation are available on the PSI-PAR web site. PMID:19674966
Automated analysis and reannotation of subcellular locations in confocal images from the Human Protein Atlas.

PubMed

Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F

2012-01-01

The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
Evaluation of artificial diets for Attacus atlas (Lepidoptera: Saturniidae) in Yogyakarta Special Region, Indonesia.

PubMed

Sukirno, Sukirno; Situmorang, J; Sumarmi, S; Soesilohadi, R C Hidayat; Pratiwi, R; Sukirno, Sukirno; Situmorang, J; Sumarmi, S; Soesilohadi, R C Hidayat; Pratiwi, R

2013-12-01

The objective of this research was to evaluate artificial diets that can be used to successfully culture the atlas silk moth, Attacus atlas L. (Lepidoptera: Saturniidae) indoors. Four plant species were evaluated as the basic component of each diet, barringtonia (Barringtonia asiatica), cheesewood (Nauclea orientalis), soursop (Annona muricata), and mahogany (Swietenia mahagoni). Evaluation of the nutritional value of each diet was determined by an analysis of the hemolymph proteins of sixth instars using the Folin-Ciocalteu assay. Survivorship, cocoon quality, and hemolymph protein content of larvae fed the barringtonia diet were higher than those of larvae fed mahogany-, cheesewood-, and soursop-based artificial diets. The average adult emergence of those fed the barringtonia-based diet was 74.5%. The weights of the cocoon in this treatment with the pupa and the empty cocoons were 7.0 and 1.1 g, respectively. Hemolymph of the larvae fed the barringtonia-based artificial diet had the highest concentration of protein with an average of 28.06 mg/ml. The atlas moth reared on the barringtonia-based artificial diet was comparable with those reared only on barringtonia leaves. However, the weight of empty cocoons, adult wingspan, and amount of hemolymph protein were lower than in those reared on barringtonia leaves only. This may suggest that the artificial barringtonia-based diet requires additional protein for maximum efficiency.
Target identification in Fusobacterium nucleatum by subtractive genomics approach and enrichment analysis of host-pathogen protein-protein interactions.

PubMed

Kumar, Amit; Thotakura, Pragna Lakshmi; Tiwary, Basant Kumar; Krishna, Ramadas

2016-05-12

Fusobacterium nucleatum, a well studied bacterium in periodontal diseases, appendicitis, gingivitis, osteomyelitis and pregnancy complications has recently gained attention due to its association with colorectal cancer (CRC) progression. Treatment with berberine was shown to reverse F. nucleatum-induced CRC progression in mice by balancing the growth of opportunistic pathogens in tumor microenvironment. Intestinal microbiota imbalance and the infections caused by F. nucleatum might be regulated by therapeutic intervention. Hence, we aimed to predict drug target proteins in F. nucleatum, through subtractive genomics approach and host-pathogen protein-protein interactions (HP-PPIs). We also carried out enrichment analysis of host interacting partners to hypothesize the possible mechanisms involved in CRC progression due to F. nucleatum. In subtractive genomics approach, the essential, virulence and resistance related proteins were retrieved from RefSeq proteome of F. nucleatum by searching against Database of Essential Genes (DEG), Virulence Factor Database (VFDB) and Antibiotic Resistance Gene-ANNOTation (ARG-ANNOT) tool respectively. A subsequent hierarchical screening to identify non-human homologous, metabolic pathway-independent/pathway-specific and druggable proteins resulted in eight pathway-independent and 27 pathway-specific druggable targets. Co-aggregation of F. nucleatum with host induces proinflammatory gene expression thereby potentiates tumorigenesis. Hence, proteins from IBDsite, a database for inflammatory bowel disease (IBD) research and those involved in colorectal adenocarcinoma as interpreted from The Cancer Genome Atlas (TCGA) were retrieved to predict drug targets based on HP-PPIs with F. nucleatum proteome. Prediction of HP-PPIs exhibited 186 interactions contributed by 103 host and 76 bacterial proteins. Bacterial interacting partners were accounted as putative targets. And enrichment analysis of host interacting partners showed statistically enriched terms that were in positive correlation with CRC, atherosclerosis, cardiovascular, osteoporosis, Alzheimer's and other diseases. Subtractive genomics analysis provided a set of target proteins suggested to be indispensable for survival and pathogenicity of F. nucleatum. These target proteins might be considered for designing potent inhibitors to abrogate F. nucleatum infections. From enrichment analysis, it was hypothesized that F. nucleatum infection might enhance CRC progression by simultaneously regulating multiple signaling cascades which could lead to up-regulation of proinflammatory responses, oncogenes, modulation of host immune defense mechanism and suppression of DNA repair system.
ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells.

PubMed

Xu, Huilei; Baroukh, Caroline; Dannenfelser, Ruth; Chen, Edward Y; Tan, Christopher M; Kou, Yan; Kim, Yujin E; Lemischka, Ihor R; Ma'ayan, Avi

2013-01-01

High content studies that profile mouse and human embryonic stem cells (m/hESCs) using various genome-wide technologies such as transcriptomics and proteomics are constantly being published. However, efforts to integrate such data to obtain a global view of the molecular circuitry in m/hESCs are lagging behind. Here, we present an m/hESC-centered database called Embryonic Stem Cell Atlas from Pluripotency Evidence integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB. Overall, the Embryonic Stem Cell Atlas from Pluripotency Evidence database is a comprehensive resource for the stem cell systems biology community. Database URL: http://www.maayanlab.net/ESCAPE
Generating patient specific pseudo-CT of the head from MR using atlas-based regression

NASA Astrophysics Data System (ADS)

Sjölund, J.; Forsberg, D.; Andersson, M.; Knutsson, H.

2015-01-01

Radiotherapy planning and attenuation correction of PET images require simulation of radiation transport. The necessary physical properties are typically derived from computed tomography (CT) images, but in some cases, including stereotactic neurosurgery and combined PET/MR imaging, only magnetic resonance (MR) images are available. With these applications in mind, we describe how a realistic, patient-specific, pseudo-CT of the head can be derived from anatomical MR images. We refer to the method as atlas-based regression, because of its similarity to atlas-based segmentation. Given a target MR and an atlas database comprising MR and CT pairs, atlas-based regression works by registering each atlas MR to the target MR, applying the resulting displacement fields to the corresponding atlas CTs and, finally, fusing the deformed atlas CTs into a single pseudo-CT. We use a deformable registration algorithm known as the Morphon and augment it with a certainty mask that allows a tailoring of the influence certain regions are allowed to have on the registration. Moreover, we propose a novel method of fusion, wherein the collection of deformed CTs is iteratively registered to their joint mean and find that the resulting mean CT becomes more similar to the target CT. However, the voxelwise median provided even better results; at least as good as earlier work that required special MR imaging techniques. This makes atlas-based regression a good candidate for clinical use.
Parcellation of the Healthy Neonatal Brain into 107 Regions Using Atlas Propagation through Intermediate Time Points in Childhood.

PubMed

Blesa, Manuel; Serag, Ahmed; Wilkinson, Alastair G; Anblagan, Devasuda; Telford, Emma J; Pataky, Rozalia; Sparrow, Sarah A; Macnaught, Gillian; Semple, Scott I; Bastin, Mark E; Boardman, James P

2016-01-01

Neuroimage analysis pipelines rely on parcellated atlases generated from healthy individuals to provide anatomic context to structural and diffusion MRI data. Atlases constructed using adult data introduce bias into studies of early brain development. We aimed to create a neonatal brain atlas of healthy subjects that can be applied to multi-modal MRI data. Structural and diffusion 3T MRI scans were acquired soon after birth from 33 typically developing neonates born at term (mean postmenstrual age at birth 39(+5) weeks, range 37(+2)-41(+6)). An adult brain atlas (SRI24/TZO) was propagated to the neonatal data using temporal registration via childhood templates with dense temporal samples (NIH Pediatric Database), with the final atlas (Edinburgh Neonatal Atlas, ENA33) constructed using the Symmetric Group Normalization (SyGN) method. After this step, the computed final transformations were applied to T2-weighted data, and fractional anisotropy, mean diffusivity, and tissue segmentations to provide a multi-modal atlas with 107 anatomical regions; a symmetric version was also created to facilitate studies of laterality. Volumes of each region of interest were measured to provide reference data from normal subjects. Because this atlas is generated from step-wise propagation of adult labels through intermediate time points in childhood, it may serve as a useful starting point for modeling brain growth during development.
A quantitative magnetic resonance histology atlas of postnatal rat brain development with regional estimates of growth and variability.

PubMed

Calabrese, Evan; Badea, Alexandra; Watson, Charles; Johnson, G Allan

2013-05-01

There has been growing interest in the role of postnatal brain development in the etiology of several neurologic diseases. The rat has long been recognized as a powerful model system for studying neuropathology and the safety of pharmacologic treatments. However, the complex spatiotemporal changes that occur during rat neurodevelopment remain to be elucidated. This work establishes the first magnetic resonance histology (MRH) atlas of the developing rat brain, with an emphasis on quantitation. The atlas comprises five specimens at each of nine time points, imaged with eight distinct MR contrasts and segmented into 26 developmentally defined brain regions. The atlas was used to establish a timeline of morphometric changes and variability throughout neurodevelopment and represents a quantitative database of rat neurodevelopment for characterizing rat models of human neurologic disease. Published by Elsevier Inc.
The MPI-Mainz UV/VIS Spectral Atlas of Gaseous Molecules of Atmospheric Interest

NASA Astrophysics Data System (ADS)

Keller-Rudek, H.; Moortgat, G. K.; Sander, R.; Sörensen, R.

2013-12-01

We present the MPI-Mainz UV/VIS Spectral Atlas of Gaseous Molecules, which is a large collection of absorption cross sections and quantum yields in the ultraviolet and visible (UV/VIS) wavelength region for gaseous molecules and radicals primarily of atmospheric interest. The data files contain results of individual measurements, covering research of almost a whole century. To compare and visualize the data sets, multicoloured graphical representations have been created. The MPI-Mainz UV/VIS Spectral Atlas is available on the Internet at http://www.uv-vis-spectral-atlas-mainz.org. It now appears with improved browse and search options, based on new database software. In addition to the Web pages, which are continuously updated, a frozen version of the data is available under the doi:10.5281/zenodo.6951.
Stroke Atlas: A 3D Interactive Tool Correlating Cerebrovascular Pathology with Underlying Neuroanatomy and Resulting Neurological Deficits

PubMed Central

Nowinski, W.L.; Chua, B.C.

2013-01-01

Understanding stroke-related pathology with underlying neuroanatomy and resulting neurological deficits is critical in education and clinical practice. Moreover, communicating a stroke situation to a patient/family is difficult because of complicated neuroanatomy and pathology. For this purpose, we created a stroke atlas. The atlas correlates localized cerebrovascular pathology with both the resulting disorder and surrounding neuroanatomy. It also provides 3D display both of labeled pathology and freely composed neuroanatomy. Disorders are described in terms of resulting signs, symptoms and syndromes, and they have been compiled for ischemic stroke, hemorrhagic stroke, and cerebral aneurysms. Neuroanatomy, subdivided into 2,000 components including 1,300 vessels, contains cerebrum, cerebellum, brainstem, spinal cord, white matter, deep grey nuclei, arteries, veins, dural sinuses, cranial nerves and tracts. A computer application was developed comprising: 1) anatomy browser with the normal brain atlas (created earlier); 2) simulator of infarcts/hematomas/aneurysms/stenoses; 3) tools to label pathology; 4) cerebrovascular pathology database with lesions and disorders, and resulting signs, symptoms and/or syndromes. The pathology database is populated with 70 lesions compiled from textbooks. The initial view of each pathological site is preset in terms of lesion location, size, surrounding surface and sectional neuroanatomy, and lesion and neuroanatomy labeling. The atlas is useful for medical students, residents, nurses, general practitioners, and stroke clinicians, neuroradiologists and neurologists. It may serve as an aid in patient-doctor communication helping a stroke clinician explain the situation to a patient/family. It also enables a layman to become familiarized with normal brain anatomy and understand what happens in stroke. PMID:23859169

Stroke atlas: a 3D interactive tool correlating cerebrovascular pathology with underlying neuroanatomy and resulting neurological deficits.

PubMed

Nowinski, W L; Chua, B C

2013-02-01

Understanding stroke-related pathology with underlying neuroanatomy and resulting neurological deficits is critical in education and clinical practice. Moreover, communicating a stroke situation to a patient/family is difficult because of complicated neuroanatomy and pathology. For this purpose, we created a stroke atlas. The atlas correlates localized cerebrovascular pathology with both the resulting disorder and surrounding neuroanatomy. It also provides 3D display both of labeled pathology and freely composed neuroanatomy. Disorders are described in terms of resulting signs, symptoms and syndromes, and they have been compiled for ischemic stroke, hemorrhagic stroke, and cerebral aneurysms. Neuroanatomy, subdivided into 2,000 components including 1,300 vessels, contains cerebrum, cerebellum, brainstem, spinal cord, white matter, deep grey nuclei, arteries, veins, dural sinuses, cranial nerves and tracts. A computer application was developed comprising: 1) anatomy browser with the normal brain atlas (created earlier); 2) simulator of infarcts/hematomas/aneurysms/stenoses; 3) tools to label pathology; 4) cerebrovascular pathology database with lesions and disorders, and resulting signs, symptoms and/or syndromes. The pathology database is populated with 70 lesions compiled from textbooks. The initial view of each pathological site is preset in terms of lesion location, size, surrounding surface and sectional neuroanatomy, and lesion and neuroanatomy labeling. The atlas is useful for medical students, residents, nurses, general practitioners, and stroke clinicians, neuroradiologists and neurologists. It may serve as an aid in patient-doctor communication helping a stroke clinician explain the situation to a patient/family. It also enables a layman to become familiarized with normal brain anatomy and understand what happens in stroke.
Bridging Neuroanatomy, Neuroradiology and Neurology: Three-Dimensional Interactive Atlas of Neurological Disorders

PubMed Central

Nowinski, W.L.; Chua, B.C.

2013-01-01

Understanding brain pathology along with the underlying neuroanatomy and the resulting neurological deficits is of vital importance in medical education and clinical practice. To facilitate and expedite this understanding, we created a three-dimensional (3D) interactive atlas of neurological disorders providing the correspondence between a brain lesion and the resulting disorder(s). The atlas contains a 3D highly parcellated atlas of normal neuroanatomy along with a brain pathology database. Normal neuroanatomy is divided into about 2,300 components, including the cerebrum, cerebellum, brainstem, spinal cord, arteries, veins, dural sinuses, tracts, cranial nerves (CN), white matter, deep gray nuclei, ventricles, visual system, muscles, glands and cervical vertebrae (C1-C5). The brain pathology database contains 144 focal and distributed synthesized lesions (70 vascular, 36 CN-related, and 38 regional anatomy-related), each lesion labeled with the resulting disorder and associated signs, symptoms, and/or syndromes compiled from materials reported in the literature. The initial view of each lesion was preset in terms of its location and size, surrounding surface and sectional (magnetic resonance) neuroanatomy, and labeling of lesion and neuroanatomy. In addition, a glossary of neurological disorders was compiled and for each disorder materials from textbooks were included to provide neurological description. This atlas of neurological disorders is potentially useful to a wide variety of users ranging from medical students, residents and nurses to general practitioners, neuroanatomists, neuroradiologists and neurologists, as it contains both normal (surface and sectional) brain anatomy and pathology correlated with neurological disorders presented in a visual and interactive way. PMID:23859280
The SysteMHC Atlas project

PubMed Central

Shao, Wenguang; Pedrioli, Patrick G A; Wolski, Witold; Scurtescu, Cristian; Schmid, Emanuel; Courcelles, Mathieu; Schuster, Heiko; Kowalewski, Daniel; Marino, Fabio; Arlehamn, Cecilia S L; Vaughan, Kerrie; Peters, Bjoern; Sette, Alessandro; Ottenhoff, Tom H M; Meijgaarden, Krista E; Nieuwenhuizen, Natalie; Kaufmann, Stefan H E; Schlapbach, Ralph; Castle, John C; Nesvizhskii, Alexey I; Nielsen, Morten; Deutsch, Eric W; Campbell, David S; Moritz, Robert L; Zubarev, Roman A; Ytterberg, Anders Jimmy; Purcell, Anthony W; Marcilla, Miguel; Paradela, Alberto; Wang, Qi; Costello, Catherine E; Ternette, Nicola; van Veelen, Peter A; van Els, Cécile A C M; de Souza, Gustavo A; Sollid, Ludvig M; Admon, Arie; Stevanovic, Stefan; Rammensee, Hans-Georg; Thibault, Pierre; Perreault, Claude; Bassani-Sternberg, Michal

2018-01-01

Abstract Mass spectrometry (MS)-based immunopeptidomics investigates the repertoire of peptides presented at the cell surface by major histocompatibility complex (MHC) molecules. The broad clinical relevance of MHC-associated peptides, e.g. in precision medicine, provides a strong rationale for the large-scale generation of immunopeptidomic datasets and recent developments in MS-based peptide analysis technologies now support the generation of the required data. Importantly, the availability of diverse immunopeptidomic datasets has resulted in an increasing need to standardize, store and exchange this type of data to enable better collaborations among researchers, to advance the field more efficiently and to establish quality measures required for the meaningful comparison of datasets. Here we present the SysteMHC Atlas (https://systemhcatlas.org), a public database that aims at collecting, organizing, sharing, visualizing and exploring immunopeptidomic data generated by MS. The Atlas includes raw mass spectrometer output files collected from several laboratories around the globe, a catalog of context-specific datasets of MHC class I and class II peptides, standardized MHC allele-specific peptide spectral libraries consisting of consensus spectra calculated from repeat measurements of the same peptide sequence, and links to other proteomics and immunology databases. The SysteMHC Atlas project was created and will be further expanded using a uniform and open computational pipeline that controls the quality of peptide identifications and peptide annotations. Thus, the SysteMHC Atlas disseminates quality controlled immunopeptidomic information to the public domain and serves as a community resource toward the generation of a high-quality comprehensive map of the human immunopeptidome and the support of consistent measurement of immunopeptidomic sample cohorts. PMID:28985418
Bridging neuroanatomy, neuroradiology and neurology: three-dimensional interactive atlas of neurological disorders.

PubMed

Nowinski, W L; Chua, B C

2013-06-01

Understanding brain pathology along with the underlying neuroanatomy and the resulting neurological deficits is of vital importance in medical education and clinical practice. To facilitate and expedite this understanding, we created a three-dimensional (3D) interactive atlas of neurological disorders providing the correspondence between a brain lesion and the resulting disorder(s). The atlas contains a 3D highly parcellated atlas of normal neuroanatomy along with a brain pathology database. Normal neuroanatomy is divided into about 2,300 components, including the cerebrum, cerebellum, brainstem, spinal cord, arteries, veins, dural sinuses, tracts, cranial nerves (CN), white matter, deep gray nuclei, ventricles, visual system, muscles, glands and cervical vertebrae (C1-C5). The brain pathology database contains 144 focal and distributed synthesized lesions (70 vascular, 36 CN-related, and 38 regional anatomy-related), each lesion labeled with the resulting disorder and associated signs, symptoms, and/or syndromes compiled from materials reported in the literature. The initial view of each lesion was preset in terms of its location and size, surrounding surface and sectional (magnetic resonance) neuroanatomy, and labeling of lesion and neuroanatomy. In addition, a glossary of neurological disorders was compiled and for each disorder materials from textbooks were included to provide neurological description. This atlas of neurological disorders is potentially useful to a wide variety of users ranging from medical students, residents and nurses to general practitioners, neuroanatomists, neuroradiologists and neurologists, as it contains both normal (surface and sectional) brain anatomy and pathology correlated with neurological disorders presented in a visual and interactive way.
Atlas selection for hippocampus segmentation: Relevance evaluation of three meta-information parameters.

PubMed

Dill, Vanderson; Klein, Pedro Costa; Franco, Alexandre Rosa; Pinho, Márcio Sarroglia

2018-04-01

Current state-of-the-art methods for whole and subfield hippocampus segmentation use pre-segmented templates, also known as atlases, in the pre-processing stages. Typically, the input image is registered to the template, which provides prior information for the segmentation process. Using a single standard atlas increases the difficulty in dealing with individuals who have a brain anatomy that is morphologically different from the atlas, especially in older brains. To increase the segmentation precision in these cases, without any manual intervention, multiple atlases can be used. However, registration to many templates leads to a high computational cost. Researchers have proposed to use an atlas pre-selection technique based on meta-information followed by the selection of an atlas based on image similarity. Unfortunately, this method also presents a high computational cost due to the image-similarity process. Thus, it is desirable to pre-select a smaller number of atlases as long as this does not impact on the segmentation quality. To pick out an atlas that provides the best registration, we evaluate the use of three meta-information parameters (medical condition, age range, and gender) to choose the atlas. In this work, 24 atlases were defined and each is based on the combination of the three meta-information parameters. These atlases were used to segment 352 vol from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Hippocampus segmentation with each of these atlases was evaluated and compared to reference segmentations of the hippocampus, which are available from ADNI. The use of atlas selection by meta-information led to a significant gain in the Dice similarity coefficient, which reached 0.68 ± 0.11, compared to 0.62 ± 0.12 when using only the standard MNI152 atlas. Statistical analysis showed that the three meta-information parameters provided a significant improvement in the segmentation accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Metrics for the Human Proteome Project 2016: Progress on Identifying and Characterizing the Human Proteome, Including Post-Translational Modifications.

PubMed

Omenn, Gilbert S; Lane, Lydie; Lundberg, Emma K; Beavis, Ronald C; Overall, Christopher M; Deutsch, Eric W

2016-11-04

The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list-the draft human proteome including confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on sequence variants, post-translational modifications (PTMs), and splice isoforms of those proteins; and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major human mass spectrometry data sets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings to present the most up to date authorative compendium of the human proteome. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.1 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [ www.thehpp.org/guidelines ]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines v1.0 from 2012 but now have insufficient evidence due to the agreed-upon more stringent Guidelines v2.0 to reduce false positives. neXtProt and PeptideAtlas now both require two non-nested, uniquely mapping (proteotypic) peptides of at least 9 aa in length. There are 2,949 missing proteins (PE2+3+4) as the baseline for submissions for this fourth annual C-HPP special issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain and 1755 redundant) entries. GPMDB has 16,190 EC4 entries, and the Human Protein Atlas has 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources of information about post-translational modifications (PTMs), single amino acid variants (SAAVSs), and splice isoforms. Meanwhile, the Biology- and Disease-driven (B/D)-HPP has created comprehensive SRM resources, generated popular protein lists to guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers initiative.
Clinical value of miR-182-5p in lung squamous cell carcinoma: a study combining data from TCGA, GEO, and RT-qPCR validation.

PubMed

Luo, Jie; Shi, Ke; Yin, Shu-Ya; Tang, Rui-Xue; Chen, Wen-Jie; Huang, Lin-Zhen; Gan, Ting-Qing; Cai, Zheng-Wen; Chen, Gang

2018-04-10

MiR-182-5p, as a member of miRNA family, can be detected in lung cancer and plays an important role in lung cancer. To explore the clinical value of miR-182-5p in lung squamous cell carcinoma (LUSC) and to unveil the molecular mechanism of LUSC. The clinical value of miR-182-5p in LUSC was investigated by collecting and calculating data from The Cancer Genome Atlas (TCGA) database, the Gene Expression Omnibus (GEO) database, and real-time quantitative polymerase chain reaction (RT-qPCR). Twelve prediction platforms were used to predict the target genes of miR-182-5p. Protein-protein interaction (PPI) networks and gene ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were used to explore the molecular mechanism of LUSC. The expression of miR-182-5p was significantly over-expressed in LUSC than in non-cancerous tissues, as evidenced by various approaches, including the TCGA database, GEO microarrays, RT-qPCR, and a comprehensive meta-analysis of 501 LUSC cases and 148 non-cancerous cases. Furthermore, a total of 81 potential target genes were chosen from the union of predicted genes and the TCGA database. GO and KEGG analyses demonstrated that the target genes are involved in pathways related to biological processes. PPIs revealed the relationships between these genes, with EPAS1, PRKCE, NR3C1, and RHOB being located in the center of the PPI network. MiR-182-5p upregulation greatly contributes to LUSC and may serve as a biomarker in LUSC.
Explore, Visualize, and Analyze Functional Cancer Proteomic Data Using the Cancer Proteome Atlas. | Office of Cancer Genomics

Cancer.gov

Reverse-phase protein arrays (RPPA) represent a powerful functional proteomic approach to elucidate cancer-related molecular mechanisms and to develop novel cancer therapies. To facilitate community-based investigation of the large-scale protein expression data generated by this platform, we have developed a user-friendly, open-access bioinformatic resource, The Cancer Proteome Atlas (TCPA, http://tcpaportal.org), which contains two separate web applications.
Significance of aquaporins’ expression in the prognosis of gastric cancer

PubMed Central

Thapa, Saroj; Chetry, Mandika; Huang, Kaiyu; Peng, Yangpei; Wang, Jinsheng; Wang, Jiaoni; Zhou, Yingying; Shen, Yigen; Xue, Yangjing; Ji, Kangting

2018-01-01

Gastric carcinoma is one of the most lethal malignancy at present with leading cause of cancer-related deaths worldwide. Aquaporins (AQPs) are a family of small, integral membrane proteins, which have been evidenced to play a crucial role in cell migration and proliferation of different cancer cells including gastric cancers. However, the aberrant expression of specific AQPs and its correlation to detect predictive and prognostic significance in gastric cancer remains elusive. In the present study, we comprehensively explored immunohistochemistry based map of protein expression profiles in normal tissues, cancer and cell lines from publicly available Human Protein Atlas (HPA) database. Moreover, to improve our understanding of general gastric biology and guide to find novel predictive prognostic gastric cancer biomarker, we also retrieved ‘The Kaplan–Meier plotter’ (KM plotter) online database with specific AQPs mRNA to overall survival (OS) in different clinicopathological features. We revealed that ubiquitous expression of AQPs protein can be effective tools to generate gastric cancer biomarker. Furthermore, high level AQP3, AQP9, and AQP11 mRNA expression were correlated with better OS in all gastric patients, whereas AQP0, AQP1, AQP4, AQP5, AQP6, AQP8, and AQP10 mRNA expression were associated with poor OS. With regard to the clinicopathological features including Laurens classification, clinical stage, human epidermal growth factor receptor 2 (HER2) status, and different treatment strategy, we could illustrate significant role of individual AQP mRNA expression in the prognosis of gastric cancer patients. Thus, our results indicated that AQP’s protein and mRNA expression in gastric cancer patients provide effective role to predict prognosis and act as an essential agent to therapeutic strategy. PMID:29678898
The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays.

PubMed

Schwenk, Jochen M; Omenn, Gilbert S; Sun, Zhi; Campbell, David S; Baker, Mark S; Overall, Christopher M; Aebersold, Ruedi; Moritz, Robert L; Deutsch, Eric W

2017-12-01

Human blood plasma provides a highly accessible window to the proteome of any individual in health and disease. Since its inception in 2002, the Human Proteome Organization's Human Plasma Proteome Project (HPPP) has been promoting advances in the study and understanding of the full protein complement of human plasma and on determining the abundance and modifications of its components. In 2017, we review the history of the HPPP and the advances of human plasma proteomics in general, including several recent achievements. We then present the latest 2017-04 build of Human Plasma PeptideAtlas, which yields ∼43 million peptide-spectrum matches and 122,730 distinct peptide sequences from 178 individual experiments at a 1% protein-level FDR globally across all experiments. Applying the latest Human Proteome Project Data Interpretation Guidelines, we catalog 3509 proteins that have at least two non-nested uniquely mapping peptides of nine amino acids or more and >1300 additional proteins with ambiguous evidence. We apply the same two-peptide guideline to historical PeptideAtlas builds going back to 2006 and examine the progress made in the past ten years in plasma proteome coverage. We also compare the distribution of proteins in historical PeptideAtlas builds in various RNA abundance and cellular localization categories. We then discuss advances in plasma proteomics based on targeted mass spectrometry as well as affinity assays, which during early 2017 target ∼2000 proteins. Finally, we describe considerations about sample handling and study design, concluding with an outlook for future advances in deciphering the human plasma proteome.
Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Omenn, Gilbert; States, David J.; Adamski, Marcin

2005-08-13

HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anticoagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics. med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasetsmore » had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan. These PPP results on complexity, dynamic range, incomplete sampling, false-positive matches, and integration of diverse datasets for plasma and serum proteins lay a foundation for development and validation of circulating protein biomarkers in health and disease.« less
A three-dimensional single-cell-resolution whole-brain atlas using CUBIC-X expansion microscopy and tissue clearing.

PubMed

Murakami, Tatsuya C; Mano, Tomoyuki; Saikawa, Shu; Horiguchi, Shuhei A; Shigeta, Daichi; Baba, Kousuke; Sekiya, Hiroshi; Shimizu, Yoshihiro; Tanaka, Kenji F; Kiyonari, Hiroshi; Iino, Masamitsu; Mochizuki, Hideki; Tainaka, Kazuki; Ueda, Hiroki R

2018-04-01

A three-dimensional single-cell-resolution mammalian brain atlas will accelerate systems-level identification and analysis of cellular circuits underlying various brain functions. However, its construction requires efficient subcellular-resolution imaging throughout the entire brain. To address this challenge, we developed a fluorescent-protein-compatible, whole-organ clearing and homogeneous expansion protocol based on an aqueous chemical solution (CUBIC-X). The expanded, well-cleared brain enabled us to construct a point-based mouse brain atlas with single-cell annotation (CUBIC-Atlas). CUBIC-Atlas reflects inhomogeneous whole-brain development, revealing a significant decrease in the cerebral visual and somatosensory cortical areas during postnatal development. Probabilistic activity mapping of pharmacologically stimulated Arc-dVenus reporter mouse brains onto CUBIC-Atlas revealed the existence of distinct functional structures in the hippocampal dentate gyrus. CUBIC-Atlas is shareable by an open-source web-based viewer, providing a new platform for whole-brain cell profiling.
Toward a public analysis database for LHC new physics searches using M ADA NALYSIS 5

NASA Astrophysics Data System (ADS)

Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.

2015-02-01

We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.
Restoration, Enhancement, and Distribution of the ATLAS-1 Imaging Spectrometric Observatory (ISO) Space Science Data Set

NASA Technical Reports Server (NTRS)

Germany, G. A.

2001-01-01

The primary goal of the funded task was to restore and distribute the ISO ATLAS-1 space science data set with enhanced software and database utilities. The first year was primarily dedicated to physically transferring the data from its original format to its initial CD archival format. The remainder of the first year was devoted to the verification of the restored data set and database. The second year was devoted to the enhancement of the data set, especially the development of IDL utilities and redesign of the database and search interface as needed. This period was also devoted to distribution of the rescued data set, principally the creation and maintenance of a web interface to the data set. The final six months was dedicated to working with NSSDC to create a permanent, off site, hive of the data set and supporting utilities. This time was also used to resolve last minute quality and design issues.
The ATLAS Eventlndex: data flow and inclusion of other metadata

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Favareto, A.; Fernandez Casani, A.; Gallas, E. J.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Hrivnac, J.; Malon, D.; Prokoshin, F.; Salt, J.; Sanchez, J.; Toebbicke, R.; Yuan, R.; ATLAS Collaboration

2016-10-01

The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
Development of Elevation and Relief Databases for ICESat-2/ATLAS Receiver Algorithms

NASA Astrophysics Data System (ADS)

Leigh, H. W.; Magruder, L. A.; Carabajal, C. C.; Saba, J. L.; Urban, T. J.; Mcgarry, J.; Schutz, B. E.

2013-12-01

The Advanced Topographic Laser Altimeter System (ATLAS) is planned to launch onboard NASA's ICESat-2 spacecraft in 2016. ATLAS operates at a wavelength of 532 nm with a laser repeat rate of 10 kHz and 6 individual laser footprints. The satellite will be in a 500 km, 91-day repeat ground track orbit at an inclination of 92°. A set of onboard Receiver Algorithms has been developed to reduce the data volume and data rate to acceptable levels while still transmitting the relevant ranging data. The onboard algorithms limit the data volume by distinguishing between surface returns and background noise and selecting a small vertical region around the surface return to be included in telemetry. The algorithms make use of signal processing techniques, along with three databases, the Digital Elevation Model (DEM), the Digital Relief Map (DRM), and the Surface Reference Mask (SRM), to find the signal and determine the appropriate dynamic range of vertical data surrounding the surface for downlink. The DEM provides software-based range gating for ATLAS. This approach allows the algorithm to limit the surface signal search to the vertical region between minimum and maximum elevations provided by the DEM (plus some margin to account for uncertainties). The DEM is constructed in a nested, three-tiered grid to account for a hardware constraint limiting the maximum vertical range to 6 km. The DRM is used to select the vertical width of the telemetry band around the surface return. The DRM contains global values of relief calculated along 140 m and 700 m ground track segments consistent with a 92° orbit. The DRM must contain the maximum value of relief seen in any given area, but must be as close to truth as possible as the DRM directly affects data volume. The SRM, which has been developed independently from the DEM and DRM, is used to set parameters within the algorithm and select telemetry bands for downlink. Both the DEM and DRM are constructed from publicly available digital elevation models. No elevation models currently exist that provide global coverage at a sufficient resolution, so several regional models have been mosaicked together to produce global databases. In locations where multiple data sets are available, evaluations have been made to determine the optimal source for the databases, primarily based on resolution and accuracy. Separate procedures for calculating relief were developed for high latitude (>60N/S) regions in order to take advantage of polar stereographic projections. An additional method for generating the databases was developed for use over Antarctica, such that high resolution, regional elevation models can be easily incorporated as they become available in the future. The SRM is used to facilitate DEM and DRM production by defining those regions that are ocean and sea ice. Ocean and sea ice elevation values are defined by the geoid, while relief is set to a constant value. Results presented will include the details of data source selection, the methodologies used to create the databases, and the final versions of both the DEM and DRM databases. Companion presentations by McGarry, et al. and Carabajal, et al. describe the ATLAS onboard Receiver Algorithms and the database verification, respectively.
The MPI-Mainz UV/VIS Spectral Atlas of Gaseous Molecules of Atmospheric Interest

NASA Astrophysics Data System (ADS)

Sander, Rolf; Keller-Rudek, Hannelore; Moortgat, Geert; Sörensen, Rüdiger

2014-05-01

Measurements from satellites can be used to obtain global concentration maps of atmospheric trace constituents. Critical parameters needed in the analysis of the satellite data are the absorption cross sections of the observed molecules. Here, we present the MPI-Mainz UV/VIS Spectral Atlas, which is a large collection of more than 5000 absorption cross section and quantum yield data files in the ultraviolet and visible (UV/VIS) wavelength region for gaseous molecules and radicals primarily of atmospheric interest. The data files contain results of individual measurements, covering research of almost a whole century. To compare and visualize the data sets, multicoloured graphical representations have been created. The Spectral Atlas is available on the internet at http://www.uv-vis-spectral-atlas-mainz.org. It has been completely overhauled and now appears with improved browse and search options, based on PostgreSQL, Django and Python database software. The web pages are continuously updated.
Multi-atlas based segmentation using probabilistic label fusion with adaptive weighting of image similarity measures.

PubMed

Sjöberg, C; Ahnesjö, A

2013-06-01

Label fusion multi-atlas approaches for image segmentation can give better segmentation results than single atlas methods. We present a multi-atlas label fusion strategy based on probabilistic weighting of distance maps. Relationships between image similarities and segmentation similarities are estimated in a learning phase and used to derive fusion weights that are proportional to the probability for each atlas to improve the segmentation result. The method was tested using a leave-one-out strategy on a database of 21 pre-segmented prostate patients for different image registrations combined with different image similarity scorings. The probabilistic weighting yields results that are equal or better compared to both fusion with equal weights and results using the STAPLE algorithm. Results from the experiments demonstrate that label fusion by weighted distance maps is feasible, and that probabilistic weighted fusion improves segmentation quality more the stronger the individual atlas segmentation quality depends on the corresponding registered image similarity. The regions used for evaluation of the image similarity measures were found to be more important than the choice of similarity measure. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
The State of the Human Proteome in 2013 as viewed through PeptideAtlas: Comparing the Kidney, Urine, and Plasma Proteomes for the Biology and Disease-driven Human Proteome Project

PubMed Central

Farrah, Terry; Deutsch, Eric W.; Omenn, Gilbert S.; Sun, Zhi; Watts, Julian D.; Yamamoto, Tadashi; Shteynberg, David; Harris, Micheleen M.; Moritz, Robert L.

2014-01-01

The kidney, urine, and plasma proteomes are intimately related: proteins and metabolic waste products are filtered from the plasma by the kidney and excreted via the urine, while kidney proteins may be secreted into the circulation or released into the urine. Shotgun proteomics datasets derived from human kidney, urine, and plasma samples were collated and processed using a uniform software pipeline, and relative protein abundances were estimated by spectral counting. The resulting PeptideAtlas builds yielded 4005, 2491, and 3553 nonredundant proteins at 1% FDR for the kidney, urine, and plasma proteomes, respectively—for kidney and plasma, the largest high-confidence protein sets to date. The same pipeline applied to all available human data yielded a 2013 Human PeptideAtlas build containing 12,644 nonredundant proteins and at least one peptide for each of ~14,000 Swiss-Prot entries, an increase over 2012 of ~7.5% of the predicted human proteome. We demonstrate that abundances are correlated between plasma and urine, examine the most abundant urine proteins not derived from either plasma or kidney, and consider the biomarker potential of proteins associated with renal decline. This analysis forms part of the Biology and Disease-driven Human Proteome Project (B/D-HPP) and a contribution to the Chromosome-centric Human Proteome Project (C-HPP) special issue. PMID:24261998
Web Proxy Auto Discovery for the WLCG

NASA Astrophysics Data System (ADS)

Dykstra, D.; Blomer, J.; Blumenfeld, B.; De Salvo, A.; Dewhurst, A.; Verguilov, V.

2017-10-01

All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily support that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which they direct to the nearest publicly accessible web proxy servers. The responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.

Web Proxy Auto Discovery for the WLCG

DOE PAGES

Dykstra, D.; Blomer, J.; Blumenfeld, B.; ...

2017-11-23

All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily supportmore » that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which it directs to the nearest publicly accessible web proxy servers. Furthermore, the responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.« less
Web Proxy Auto Discovery for the WLCG

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dykstra, D.; Blomer, J.; Blumenfeld, B.

All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily supportmore » that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which it directs to the nearest publicly accessible web proxy servers. Furthermore, the responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.« less
The SysteMHC Atlas project.

PubMed

Shao, Wenguang; Pedrioli, Patrick G A; Wolski, Witold; Scurtescu, Cristian; Schmid, Emanuel; Vizcaíno, Juan A; Courcelles, Mathieu; Schuster, Heiko; Kowalewski, Daniel; Marino, Fabio; Arlehamn, Cecilia S L; Vaughan, Kerrie; Peters, Bjoern; Sette, Alessandro; Ottenhoff, Tom H M; Meijgaarden, Krista E; Nieuwenhuizen, Natalie; Kaufmann, Stefan H E; Schlapbach, Ralph; Castle, John C; Nesvizhskii, Alexey I; Nielsen, Morten; Deutsch, Eric W; Campbell, David S; Moritz, Robert L; Zubarev, Roman A; Ytterberg, Anders Jimmy; Purcell, Anthony W; Marcilla, Miguel; Paradela, Alberto; Wang, Qi; Costello, Catherine E; Ternette, Nicola; van Veelen, Peter A; van Els, Cécile A C M; Heck, Albert J R; de Souza, Gustavo A; Sollid, Ludvig M; Admon, Arie; Stevanovic, Stefan; Rammensee, Hans-Georg; Thibault, Pierre; Perreault, Claude; Bassani-Sternberg, Michal; Aebersold, Ruedi; Caron, Etienne

2018-01-04

Mass spectrometry (MS)-based immunopeptidomics investigates the repertoire of peptides presented at the cell surface by major histocompatibility complex (MHC) molecules. The broad clinical relevance of MHC-associated peptides, e.g. in precision medicine, provides a strong rationale for the large-scale generation of immunopeptidomic datasets and recent developments in MS-based peptide analysis technologies now support the generation of the required data. Importantly, the availability of diverse immunopeptidomic datasets has resulted in an increasing need to standardize, store and exchange this type of data to enable better collaborations among researchers, to advance the field more efficiently and to establish quality measures required for the meaningful comparison of datasets. Here we present the SysteMHC Atlas (https://systemhcatlas.org), a public database that aims at collecting, organizing, sharing, visualizing and exploring immunopeptidomic data generated by MS. The Atlas includes raw mass spectrometer output files collected from several laboratories around the globe, a catalog of context-specific datasets of MHC class I and class II peptides, standardized MHC allele-specific peptide spectral libraries consisting of consensus spectra calculated from repeat measurements of the same peptide sequence, and links to other proteomics and immunology databases. The SysteMHC Atlas project was created and will be further expanded using a uniform and open computational pipeline that controls the quality of peptide identifications and peptide annotations. Thus, the SysteMHC Atlas disseminates quality controlled immunopeptidomic information to the public domain and serves as a community resource toward the generation of a high-quality comprehensive map of the human immunopeptidome and the support of consistent measurement of immunopeptidomic sample cohorts. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
33 CFR 3.01-1 - General description.

Code of Federal Regulations, 2013 CFR

2013-07-01

... Zone and each Marine Inspection Zone described in this part also includes the exclusive economic zone... jurisdictional limits were based upon the National Transportation Atlas Database 2003 produced by the Bureau of...
33 CFR 3.01-1 - General description.

Code of Federal Regulations, 2014 CFR

2014-07-01

... Zone and each Marine Inspection Zone described in this part also includes the exclusive economic zone... jurisdictional limits were based upon the National Transportation Atlas Database 2003 produced by the Bureau of...
33 CFR 3.01-1 - General description.

Code of Federal Regulations, 2012 CFR

2012-07-01

... Zone and each Marine Inspection Zone described in this part also includes the exclusive economic zone... jurisdictional limits were based upon the National Transportation Atlas Database 2003 produced by the Bureau of...
33 CFR 3.01-1 - General description.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Zone and each Marine Inspection Zone described in this part also includes the exclusive economic zone... jurisdictional limits were based upon the National Transportation Atlas Database 2003 produced by the Bureau of...
ShakeMap Atlas 2.0: an improved suite of recent historical earthquake ShakeMaps for global hazard analyses and loss model calibration

USGS Publications Warehouse

Garcia, D.; Mah, R.T.; Johnson, K.L.; Hearne, M.G.; Marano, K.D.; Lin, K.-W.; Wald, D.J.

2012-01-01

We introduce the second version of the U.S. Geological Survey ShakeMap Atlas, which is an openly-available compilation of nearly 8,000 ShakeMaps of the most significant global earthquakes between 1973 and 2011. This revision of the Atlas includes: (1) a new version of the ShakeMap software that improves data usage and uncertainty estimations; (2) an updated earthquake source catalogue that includes regional locations and finite fault models; (3) a refined strategy to select prediction and conversion equations based on a new seismotectonic regionalization scheme; and (4) vastly more macroseismic intensity and ground-motion data from regional agencies All these changes make the new Atlas a self-consistent, calibrated ShakeMap catalogue that constitutes an invaluable resource for investigating near-source strong ground-motion, as well as for seismic hazard, scenario, risk, and loss-model development. To this end, the Atlas will provide a hazard base layer for PAGER loss calibration and for the Earthquake Consequences Database within the Global Earthquake Model initiative.
Spectral Atlas of X-ray Lines Emitted During Solar Flares Based on CHIANTI

NASA Technical Reports Server (NTRS)

Landi, E.; Phillips, K. J. H.

2005-01-01

A spectral atlas of X-ray lines in the wavelength range 7.47-18.97 Angstroms is presented, based on high-resolution spectra obtained during two M-class solar flares (on 1980 August 25 and 1985 July 2) with the Flat Crystal Spectrometer on board the Solar Maximum Mission. The physical properties of the flaring plasmas are derived as a function of time using strong, isolated lines. From these properties predicted spectra using the CHIANTI database have been obtained which were then compared with wavelengths and fluxes of lines in the observed spectra to establish line identifications. identifications for nearly all the observed lines in the resulting atlas are given, with some significant corrections to previous analysis of these flare spectra.
Poster - 32: Atlas Selection for Automated Segmentation of Pelvic CT for Prostate Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mallawi, Abrar; Farrell, TomTom; Diamond, Kevin-Ro

2016-08-15

Atlas based-segmentation has recently been evaluated for use in prostate radiotherapy. In a typical approach, the essential step is the selection of an atlas from a database that the best matches of the target image. This work proposes an atlas selection strategy and evaluate it impacts on final segmentation accuracy. Several anatomical parameters were measured to indicate the overall prostate and body shape, all of these measurements obtained on CT images. A brute force procedure was first performed for a training dataset of 20 patients using image registration to pair subject with similar contours; each subject was served as amore » target image to which all reaming 19 images were affinity registered. The overlap between the prostate and femoral heads was quantified for each pair using the Dice Similarity Coefficient (DSC). Finally, an atlas selection procedure was designed; relying on the computation of a similarity score defined as a weighted sum of differences between the target and atlas subject anatomical measurement. The algorithm ability to predict the most similar atlas was excellent, achieving mean DSCs of 0.78 ± 0.07 and 0.90 ± 0.02 for the CTV and either femoral head. The proposed atlas selection yielded 0.72 ± 0.11 and 0.87 ± 0.03 for CTV and either femoral head. The DSC obtained with the proposed selection method were slightly lower than the maximum established using brute force, but this does not include potential improvements expected with deformable registration. The proposed atlas selection method provides reasonable segmentation accuracy.« less
A whole brain atlas with sub-parcellation of cortical gyri using resting fMRI

NASA Astrophysics Data System (ADS)

Joshi, Anand A.; Choi, Soyoung; Sonkar, Gaurav; Chong, Minqi; Gonzalez-Martinez, Jorge; Nair, Dileep; Shattuck, David W.; Damasio, Hanna; Leahy, Richard M.

2017-02-01

The new hybrid-BCI-DNI atlas is a high-resolution MPRAGE, single-subject atlas, constructed using both anatomical and functional information to guide the parcellation of the cerebral cortex. Anatomical labeling was performed manually on coronal single-slice images guided by sulcal and gyral landmarks to generate the original (non-hybrid) BCI-DNI atlas. Functional sub-parcellations of the gyral ROIs were then generated from 40 minimally preprocessed resting fMRI datasets from the HCP database. Gyral ROIs were transferred from the BCI-DNI atlas to the 40 subjects using the HCP grayordinate space as a reference. For each subject, each gyral ROI was subdivided using the fMRI data by applying spectral clustering to a similarity matrix computed from the fMRI time-series correlations between each vertex pair. The sub-parcellations were then transferred back to the original cortical mesh to create the subparcellated hBCI-DNI atlas with a total of 67 cortical regions per hemisphere. To assess the stability of the gyral subdivisons, a separate set of 60 HCP datasets were processed as follows: 1) coregistration of the structural scans to the hBCI-DNI atlas; 2) coregistration of the anatomical BCI-DNI atlas without functional subdivisions, followed by sub-parcellation of each subject's resting fMRI data as described above. We then computed consistency between the anatomically-driven delineation of each gyral subdivision and that obtained per subject using individual fMRI data. The gyral sub-parcellations generated by atlas-based registration show variable but generally good overlap of the confidence intervals with the resting fMRI-based subdivisions. These consistency measures will provide a quantitative measure of reliability of each subdivision to users of the atlas.
Large-scale inference of protein tissue origin in gram-positive sepsis plasma using quantitative targeted proteomics

PubMed Central

Malmström, Erik; Kilsgård, Ola; Hauri, Simon; Smeds, Emanuel; Herwald, Heiko; Malmström, Lars; Malmström, Johan

2016-01-01

The plasma proteome is highly dynamic and variable, composed of proteins derived from surrounding tissues and cells. To investigate the complex processes that control the composition of the plasma proteome, we developed a mass spectrometry-based proteomics strategy to infer the origin of proteins detected in murine plasma. The strategy relies on the construction of a comprehensive protein tissue atlas from cells and highly vascularized organs using shotgun mass spectrometry. The protein tissue atlas was transformed to a spectral library for highly reproducible quantification of tissue-specific proteins directly in plasma using SWATH-like data-independent mass spectrometry analysis. We show that the method can determine drastic changes of tissue-specific protein profiles in blood plasma from mouse animal models with sepsis. The strategy can be extended to several other species advancing our understanding of the complex processes that contribute to the plasma proteome dynamics. PMID:26732734
FlyAtlas 2: a new version of the Drosophila melanogaster expression atlas with RNA-Seq, miRNA-Seq and sex-specific data

PubMed Central

Krause, Sue A; Pandit, Aniruddha; Davies, Shireen A

2018-01-01

Abstract FlyAtlas 2 (www.flyatlas2.org) is part successor, part complement to the FlyAtlas database and web application for studying the expression of the genes of Drosophila melanogaster in different tissues of adults and larvae. Although generated in the same lab with the same fly line raised on the same diet as FlyAtlas, the FlyAtlas2 resource employs a completely new set of expression data based on RNA-Seq, rather than microarray analysis, and so it allows the user to obtain information for the expression of different transcripts of a gene. Furthermore, the data for somatic tissues are now available for both male and female adult flies, allowing studies of sexual dimorphism. Gene coverage has been extended by the inclusion of microRNAs and many of the RNA genes included in Release 6 of the Drosophila reference genome. The web interface has been modified to accommodate the extra data, but at the same time has been adapted for viewing on small mobile devices. Users also have access to the RNA-Seq reads displayed alongside the annotated Drosophila genome in the (external) UCSC browser, and are able to link out to the previous FlyAtlas resource to compare the data obtained by RNA-Seq with that obtained using microarrays. PMID:29069479
A New Database Dedicated to Volcanic Hazards and Risks: The atlas of Merapi Volcano, Indonesia

NASA Astrophysics Data System (ADS)

Lavigne, Franck; Surono, Dr; Mei, Estuning; de Belizal, Edouard; Cholik, Noer; Picquout, Adrien; Komorowski, Jean-Christophe; Morin, Julie; sri Hadmoko, Danang

2014-05-01

Merapi volcano is one of the most active volcanoes worldwide. Approximately 1.3 million people live within a radius 20 km from the summit. In the framework of both, the FP7 MIA VITA Project, and the SEDIMER Project funded by AXA Research Fund, we have built a database at the village scale, which includes the elements at risk and the local resources. This unique geospatial database was used to build a series of maps at the scale of the volcano, providing the core of the Merapi atlas. Designed by the French Laboratory of Physical Geography in Meudon (France) and the Center of Volcanology and Geological Hazards Mitigation in Bandung (Indonesia), this atlas provides a state of the art synthesis of knowledge on Merapi, from the reconstruction of past eruptions and assessment of volcanic hazards to the quantification of vulnerability and capacities. It is pertinent to a broad audience ranging from volcanologists to the Indonesian population interested to learn about their sacred volcano. The primary goal of this Atlas is to provide an essential blueprint for planners and public officials involved in long-term development as well as risk and crisis management. The atlas contains 63 color plates gathered in 6 chapters: the introduction summarises the geological context as well as the environmental and human context of Merapi volcano. The second chapter pertains to the geology, the past activity, and the volcanic hazards at Merapi. The third chapter is dedicated to the resources offered by the volcano, including agriculture, livestock, and sand mining activities. The fourth chapter focuses on vulnerability and capacities. The fifth chapter provides a reconstruction of the 2010 VEI 4 eruption of Merapi and its environmental consequences. The sixth chapter summarises the socio-economical impact of the eruption, including mapping of casualties, evacuation, building damage, and an assessment of air traffic disturbance. The seventh chapter focuses on rain-triggered lahar activity following the 2010 eruption, and the associated impact at the local scale. In the conclusion, we show how the 2010 eruption of Merapi improved volcanic risk management, through an updated volcanic hazard map, the establishment of a new high-tech monitoring system, as well as the development of community-based disaster reduction measures. Extensive use of colour in maps at various scales, graphics, and photos, provides a visually appealing synthesis of the hazards and risks at Merapi volcano, one of the most dangerous in the world. This atlas is available online in free access.
Targeted quantitative analysis of Streptococcus pyogenes virulence factors by multiple reaction monitoring.

PubMed

Lange, Vinzenz; Malmström, Johan A; Didion, John; King, Nichole L; Johansson, Björn P; Schäfer, Juliane; Rameseder, Jonathan; Wong, Chee-Hong; Deutsch, Eric W; Brusniak, Mi-Youn; Bühlmann, Peter; Björck, Lars; Domon, Bruno; Aebersold, Ruedi

2008-08-01

In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure.
GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms.

PubMed

Moraes, João P A; Pappa, Gisele L; Pires, Douglas E V; Izidoro, Sandro C

2017-07-03

Enzyme active sites are important and conserved functional regions of proteins whose identification can be an invaluable step toward protein function prediction. Most of the existing methods for this task are based on active site similarity and present limitations including performing only exact matches on template residues, template size restraints, despite not being capable of finding inter-domain active sites. To fill this gap, we proposed GASS-WEB, a user-friendly web server that uses GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for similar active sites in proteins. GASS-WEB can be used under two different scenarios: (i) given a protein of interest, to match a set of specific active site templates; or (ii) given an active site template, looking for it in a database of protein structures. The method has shown to be very effective on a range of experiments and was able to correctly identify >90% of the catalogued active sites from the Catalytic Site Atlas. It also managed to achieve a Matthew correlation coefficient of 0.63 using the Critical Assessment of protein Structure Prediction (CASP 10) dataset. In our analysis, GASS was ranking fourth among 18 methods. GASS-WEB is freely available at http://gass.unifei.edu.br/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The ATLAS Software Installation System v2: a highly available system to install and validate Grid and Cloud sites via Panda

NASA Astrophysics Data System (ADS)

De Salvo, A.; Kataoka, M.; Sanchez Pineda, A.; Smirnov, Y.

2015-12-01

The ATLAS Installation System v2 is the evolution of the original system, used since 2003. The original tool has been completely re-designed in terms of database backend and components, adding support for submission to multiple backends, including the original Workload Management Service (WMS) and the new PanDA modules. The database engine has been changed from plain MySQL to Galera/Percona and the table structure has been optimized to allow a full High-Availability (HA) solution over Wide Area Network. The servlets, running on each frontend, have been also decoupled from local settings, to allow an easy scalability of the system, including the possibility of an HA system with multiple sites. The clients can also be run in multiple copies and in different geographical locations, and take care of sending the installation and validation jobs to the target Grid or Cloud sites. Moreover, the Installation Database is used as source of parameters by the automatic agents running in CVMFS, in order to install the software and distribute it to the sites. The system is in production for ATLAS since 2013, having as main sites in HA the INFN Roma Tier 2 and the CERN Agile Infrastructure. The Light Job Submission Framework for Installation (LJSFi) v2 engine is directly interfacing with PanDA for the Job Management, the Atlas Grid Information System (AGIS) for the site parameter configurations, and CVMFS for both core components and the installation of the software itself. LJSFi2 is also able to use other plugins, and is essentially Virtual Organization (VO) agnostic, so can be directly used and extended to cope with the requirements of any Grid or Cloud enabled VO. In this work we will present the architecture, performance, status and possible evolutions to the system for the LHC Run2 and beyond.
CASTp 3.0: computed atlas of surface topography of proteins.

PubMed

Tian, Wei; Chen, Chang; Lei, Xue; Zhao, Jieling; Liang, Jie

2018-06-01

Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.
PmiRExAt: plant miRNA expression atlas database and web applications

PubMed Central

Gurjar, Anoop Kishor Singh; Panwar, Abhijeet Singh; Gupta, Rajinder; Mantri, Shrikant S.

2016-01-01

High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in. PMID:27081157
The Pig PeptideAtlas: A resource for systems biology in animal production and biomedicine.

PubMed

Hesselager, Marianne O; Codrea, Marius C; Sun, Zhi; Deutsch, Eric W; Bennike, Tue B; Stensballe, Allan; Bundgaard, Louise; Moritz, Robert L; Bendixen, Emøke

2016-02-01

Biological research of Sus scrofa, the domestic pig, is of immediate relevance for food production sciences, and for developing pig as a model organism for human biomedical research. Publicly available data repositories play a fundamental role for all biological sciences, and protein data repositories are in particular essential for the successful development of new proteomic methods. Cumulative proteome data repositories, including the PeptideAtlas, provide the means for targeted proteomics, system-wide observations, and cross-species observational studies, but pigs have so far been underrepresented in existing repositories. We here present a significantly improved build of the Pig PeptideAtlas, which includes pig proteome data from 25 tissues and three body fluid types mapped to 7139 canonical proteins. The content of the Pig PeptideAtlas reflects actively ongoing research within the veterinary proteomics domain, and this article demonstrates how the expression of isoform-unique peptides can be observed across distinct tissues and body fluids. The Pig PeptideAtlas is a unique resource for use in animal proteome research, particularly biomarker discovery and for preliminary design of SRM assays, which are equally important for progress in research that supports farm animal production and veterinary health, as for developing pig models with relevance to human health research. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

The Pig PeptideAtlas: a resource for systems biology in animal production and biomedicine

PubMed Central

Hesselager, Marianne O.; Codrea, Marius C.; Sun, Zhi; Deutsch, Eric W.; Bennike, Tue B.; Stensballe, Allan; Bundgaard, Louise; Moritz, Robert L.; Bendixen, Emøke

2016-01-01

Biological research of Sus scrofa, the domestic pig, is of immediate relevance for food production sciences, and for developing pig as a model organism for human biomedical research. Publicly available data repositories play a fundamental role for all biological sciences, and protein data repositories are in particular essential for the successful development of new proteomic methods. Cumulative proteome data repositories, including the PeptideAtlas, provide the means for targeted proteomics, system wide observations, and cross species observational studies, but pigs have so far been underrepresented in existing repositories. We here present a significantly improved build of the Pig PeptideAtlas, which includes pig proteome data from 25 tissues and three body fluid types mapped to 7139 canonical proteins. The content of the Pig PeptideAtlas reflects actively ongoing research within the veterinary proteomics domain, and this manuscript demonstrates how the expression of isoform-unique peptides can be observed across distinct tissues and body fluids. The Pig PeptideAtlas is a unique resource for use in animal proteome research, particularly biomarker discovery and for preliminary design of SRM assays, which are equally important for progress in research that supports farm animal production and veterinary health, as for developing pig models with relevance to human health research. PMID:26699206
MARCKS Regulates Growth, Radiation Sensitivity and is a Novel Prognostic Factor for Glioma

PubMed Central

Jarboe, John S.; Anderson, Joshua C.; Duarte, Christine W.; Mehta, Tapan; Nowsheen, Somaira; Hicks, Patricia H.; Whitley, Alexander C.; Rohrbach, Timothy D.; McCubrey, Raymond O.; Chiu, Sherard; Burleson, Tamara M.; Bonner, James A.; Gillespie, G. Yancey; Yang, Eddy S.; Willey, Christopher D.

2013-01-01

Purpose This study assessed whether Myristoylated Alanine Rich C-Kinase Substrate (MARCKS) can regulate glioblastoma (GBM) growth, radiation sensitivity and clinical outcome. Experimental Design MARCKS protein levels were analyzed in five GBM explant cell lines and eight patient-derived xenograft tumors by immunoblot, and these levels were correlated to proliferation rates and intracranial growth rates, respectively. Manipulation of MARCKS protein levels was assessed by lentiviral-mediated shRNA knockdown in the U251 cell line and MARCKS over-expression in the U87 cell line. The effect of manipulation of MARCKS on proliferation, radiation sensitivity and senescence was assessed. MARCKS gene expression was correlated with survival outcomes in the Repository of Molecular Brain Neoplasia Data (REMBRANDT) Database and The Cancer Genome Atlas (TCGA). Results MARCKS protein expression was inversely correlated with GBM proliferation and intracranial xenograft growth rates. Genetic silencing of MARCKS promoted GBM proliferation and radiation resistance, while MARCKS overexpression greatly reduced GBM growth potential and induced senescence. We found MARCKS gene expression to be directly correlated with survival in both the REMBRANDT and TCGA databases. Specifically, patients with high MARCKS expressing tumors of the Proneural molecular subtype had significantly increased survival rates. This effect was most pronounced in tumors with unmethylated O6-methylguanine DNA methyltransferase (MGMT) promoters, a traditionally poor prognostic factor. Conclusions MARCKS levels impact GBM growth and radiation sensitivity. High MARCKS expressing GBM tumors are associated with improved survival, particularly with unmethylated MGMT promoters. These findings suggest the use of MARCKS as a novel target and biomarker for prognosis in the Proneural subtype of GBM. PMID:22619307
From the European indoor radon map towards an atlas of natural radiation.

PubMed

Tollefsen, T; Cinelli, G; Bossew, P; Gruber, V; De Cort, M

2014-11-01

In 2006, the Joint Research Centre of the European Commission launched a project to map radon at the European level, as part of a planned European Atlas of Natural Radiation. It started with a map of indoor radon concentrations. As of May 2014, this map includes data from 24 countries, covering a fair part of Europe. Next, a European map of geogenic radon, intended to show 'what earth delivers' in terms of radon potential (RP), was started in 2008. A first trial map has been created, and a database was established to collect all available data relevant to the RP. The Atlas should eventually display the geographical distribution of physical quantities related to natural radiation. In addition to radon, it will comprise maps of quantities such as cosmic rays and terrestrial gamma radiation. In this paper, the authors present the current state of the radon maps and the Atlas. © The Author 2014. Published by Oxford University Press.
Two-stage atlas subset selection in multi-atlas based image segmentation.

PubMed

Zhao, Tingting; Ruan, Dan

2015-06-01

Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stage atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.
Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases

PubMed Central

Zaslavsky, Ilya; Baldock, Richard A.; Boline, Jyl

2014-01-01

Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project. PMID:25309417
Cyberinfrastructure for the digital brain: spatial standards for integrating rodent brain atlases.

PubMed

Zaslavsky, Ilya; Baldock, Richard A; Boline, Jyl

2014-01-01

Biomedical research entails capture and analysis of massive data volumes and new discoveries arise from data-integration and mining. This is only possible if data can be mapped onto a common framework such as the genome for genomic data. In neuroscience, the framework is intrinsically spatial and based on a number of paper atlases. This cannot meet today's data-intensive analysis and integration challenges. A scalable and extensible software infrastructure that is standards based but open for novel data and resources, is required for integrating information such as signal distributions, gene-expression, neuronal connectivity, electrophysiology, anatomy, and developmental processes. Therefore, the International Neuroinformatics Coordinating Facility (INCF) initiated the development of a spatial framework for neuroscience data integration with an associated Digital Atlasing Infrastructure (DAI). A prototype implementation of this infrastructure for the rodent brain is reported here. The infrastructure is based on a collection of reference spaces to which data is mapped at the required resolution, such as the Waxholm Space (WHS), a 3D reconstruction of the brain generated using high-resolution, multi-channel microMRI. The core standards of the digital atlasing service-oriented infrastructure include Waxholm Markup Language (WaxML): XML schema expressing a uniform information model for key elements such as coordinate systems, transformations, points of interest (POI)s, labels, and annotations; and Atlas Web Services: interfaces for querying and updating atlas data. The services return WaxML-encoded documents with information about capabilities, spatial reference systems (SRSs) and structures, and execute coordinate transformations and POI-based requests. Key elements of INCF-DAI cyberinfrastructure have been prototyped for both mouse and rat brain atlas sources, including the Allen Mouse Brain Atlas, UCSD Cell-Centered Database, and Edinburgh Mouse Atlas Project.
Discriminative confidence estimation for probabilistic multi-atlas label fusion.

PubMed

Benkarim, Oualid M; Piella, Gemma; González Ballester, Miguel Angel; Sanroma, Gerard

2017-12-01

Quantitative neuroimaging analyses often rely on the accurate segmentation of anatomical brain structures. In contrast to manual segmentation, automatic methods offer reproducible outputs and provide scalability to study large databases. Among existing approaches, multi-atlas segmentation has recently shown to yield state-of-the-art performance in automatic segmentation of brain images. It consists in propagating the labelmaps from a set of atlases to the anatomy of a target image using image registration, and then fusing these multiple warped labelmaps into a consensus segmentation on the target image. Accurately estimating the contribution of each atlas labelmap to the final segmentation is a critical step for the success of multi-atlas segmentation. Common approaches to label fusion either rely on local patch similarity, probabilistic statistical frameworks or a combination of both. In this work, we propose a probabilistic label fusion framework based on atlas label confidences computed at each voxel of the structure of interest. Maximum likelihood atlas confidences are estimated using a supervised approach, explicitly modeling the relationship between local image appearances and segmentation errors produced by each of the atlases. We evaluate different spatial pooling strategies for modeling local segmentation errors. We also present a novel type of label-dependent appearance features based on atlas labelmaps that are used during confidence estimation to increase the accuracy of our label fusion. Our approach is evaluated on the segmentation of seven subcortical brain structures from the MICCAI 2013 SATA Challenge dataset and the hippocampi from the ADNI dataset. Overall, our results indicate that the proposed label fusion framework achieves superior performance to state-of-the-art approaches in the majority of the evaluated brain structures and shows more robustness to registration errors. Copyright © 2017 Elsevier B.V. All rights reserved.
Improved segmentation of cerebellar structures in children

PubMed Central

Narayanan, Priya Lakshmi; Boonazier, Natalie; Warton, Christopher; Molteno, Christopher D; Joseph, Jesuchristopher; Jacobson, Joseph L; Jacobson, Sandra W; Zöllei, Lilla; Meintjes, Ernesta M

2016-01-01

Background Consistent localization of cerebellar cortex in a standard coordinate system is important for functional studies and detection of anatomical alterations in studies of morphometry. To date, no pediatric cerebellar atlas is available. New method The probabilistic Cape Town Pediatric Cerebellar Atlas (CAPCA18) was constructed in the age-appropriate National Institute of Health Pediatric Database asymmetric template space using manual tracings of 16 cerebellar compartments in 18 healthy children (9–13 years) from Cape Town, South Africa. The individual atlases of the training subjects were also used to implement multi atlas label fusion using multi atlas majority voting (MAMV) and multi atlas generative model (MAGM) approaches. Segmentation accuracy in 14 test subjects was compared for each method to ‘gold standard’ manual tracings. Results Spatial overlap between manual tracings and CAPCA18 automated segmentation was 73% or higher for all lobules in both hemispheres, except VIIb and X. Automated segmentation using MAGM yielded the best segmentation accuracy over all lobules (mean Dice Similarity Coefficient 0.76; range 0.55–0.91). Comparison with existing methods In all lobules, spatial overlap of CAPCA18 segmentations with manual tracings was similar or higher than those obtained with SUIT (spatially unbiased infra-tentorial template), providing additional evidence of the benefits of an age appropriate atlas. MAGM segmentation accuracy was comparable to values reported recently by Park et al. (2014) in adults (across all lobules mean DSC = 0.73, range 0.40–0.89). Conclusions CAPCA18 and the associated multi atlases of the training subjects yield improved segmentation of cerebellar structures in children. PMID:26743973
DOE Office of Scientific and Technical Information (OSTI.GOV)

Peressutti, D; Schipaanboord, B; Kadir, T

Purpose: To investigate the effectiveness of atlas selection methods for improving atlas-based auto-contouring in radiotherapy planning. Methods: 275 H&N clinically delineated cases were employed as an atlas database from which atlases would be selected. A further 40 previously contoured cases were used as test patients against which atlas selection could be performed and evaluated. 26 variations of selection methods proposed in the literature and used in commercial systems were investigated. Atlas selection methods comprised either global or local image similarity measures, computed after rigid or deformable registration, combined with direct atlas search or with an intermediate template image. Workflow Boxmore » (Mirada-Medical, Oxford, UK) was used for all auto-contouring. Results on brain, brainstem, parotids and spinal cord were compared to random selection, a fixed set of 10 “good” atlases, and optimal selection by an “oracle” with knowledge of the ground truth. The Dice score and the average ranking with respect to the “oracle” were employed to assess the performance of the top 10 atlases selected by each method. Results: The fixed set of “good” atlases outperformed all of the atlas-patient image similarity-based selection methods (mean Dice 0.715 c.f. 0.603 to 0.677). In general, methods based on exhaustive comparison of local similarity measures showed better average Dice scores (0.658 to 0.677) compared to the use of either template image (0.655 to 0.672) or global similarity measures (0.603 to 0.666). The performance of image-based selection methods was found to be only slightly better than a random (0.645). Dice scores given relate to the left parotid, but similar results patterns were observed for all organs. Conclusion: Intuitively, atlas selection based on the patient CT is expected to improve auto-contouring performance. However, it was found that published approaches performed marginally better than random and use of a fixed set of representative atlases showed favourable performance. This research was funded via InnovateUK Grant 600277 as part of Eurostars Grant E!9297. DP,BS,MG,TK are employees of Mirada Medical Ltd.« less
Bridging the Qualitative/Quantitative Software Divide

PubMed Central

Annechino, Rachelle; Antin, Tamar M. J.; Lee, Juliet P.

2011-01-01

To compare and combine qualitative and quantitative data collected from respondents in a mixed methods study, the research team developed a relational database to merge survey responses stored and analyzed in SPSS and semistructured interview responses stored and analyzed in the qualitative software package ATLAS.ti. The process of developing the database, as well as practical considerations for researchers who may wish to use similar methods, are explored. PMID:22003318
Exercises in Anatomy, Connectivity, and Morphology using Neuromorpho.org and the Allen Brain Atlas.

PubMed

Chu, Philip; Peck, Joshua; Brumberg, Joshua C

2015-01-01

Laboratory instruction of neuroscience is often limited by the lack of physical resources and supplies (e.g., brains specimens, dissection kits, physiological equipment). Online databases can serve as supplements to material labs by providing professionally collected images of brain specimens and their underlying cellular populations with resolution and quality that is extremely difficult to access for strictly pedagogical purposes. We describe a method using two online databases, the Neuromorpho.org and the Allen Brain Atlas (ABA), that freely provide access to data from working brain scientists that can be modified for laboratory instruction/exercises. Neuromorpho.org is the first neuronal morphology database that provides qualitative and quantitative data from reconstructed cells analyzed in published scientific reports. The Neuromorpho.org database contains cross species and multiple neuronal phenotype datasets which allows for comparative examinations. The ABA provides modules that allow students to study the anatomy of the rodent brain, as well as observe the different cellular phenotypes that exist using histochemical labeling. Using these tools in conjunction, advanced students can ask questions about qualitative and quantitative neuronal morphology, then examine the distribution of the same cell types across the entire brain to gain a full appreciation of the magnitude of the brain's complexity.
ICESat-2 / ATLAS Flight Science Receiver Algorithms

NASA Astrophysics Data System (ADS)

Mcgarry, J.; Carabajal, C. C.; Degnan, J. J.; Mallama, A.; Palm, S. P.; Ricklefs, R.; Saba, J. L.

2013-12-01

NASA's Advanced Topographic Laser Altimeter System (ATLAS) will be the single instrument on the ICESat-2 spacecraft which is expected to launch in 2016 with a 3 year mission lifetime. The ICESat-2 orbital altitude will be 500 km with a 92 degree inclination and 91-day repeat tracks. ATLAS is a single photon detection system transmitting at 532nm with a laser repetition rate of 10 kHz and a 6 spot pattern on the Earth's surface. Without some method of eliminating solar background noise in near real-time, the volume of ATLAS telemetry would far exceed the normal X-band downlink capability. To reduce the data volume to an acceptable level a set of onboard Receiver Algorithms has been developed. These Algorithms limit the daily data volume by distinguishing surface echoes from the background noise and allow the instrument to telemeter only a small vertical region about the signal. This is accomplished through the use of an onboard Digital Elevation Model (DEM), signal processing techniques, and an onboard relief map. Similar to what was flown on the ATLAS predecessor GLAS (Geoscience Laser Altimeter System) the DEM provides minimum and maximum heights for each 1 degree x 1 degree tile on the Earth. This information allows the onboard algorithm to limit its signal search to the region between minimum and maximum heights (plus some margin for errors). The understanding that the surface echoes will tend to clump while noise will be randomly distributed led us to histogram the received event times. The selection of the signal locations is based on those histogram bins with statistically significant counts. Once the signal location has been established the onboard Digital Relief Map (DRM) is used to determine the vertical width of the telemetry band about the signal. The ATLAS Receiver Algorithms are nearing completion of the development phase and are currently being tested using a Monte Carlo Software Simulator that models the instrument, the orbit and the environment. This Simulator makes it possible to check all logic paths that could be encountered by the Algorithms on orbit. In addition the NASA airborne instrument MABEL is collecting data with characteristics similar to what ATLAS will see. MABEL data is being used to test the ATLAS Receiver Algorithms. Further verification will be performed during Integration and Testing of the ATLAS instrument and during Environmental Testing on the full ATLAS instrument. Results from testing to date show the Receiver Algorithms have the ability to handle a wide range of signal and noise levels with a very good sensitivity at relatively low signal to noise ratios. In addition, preliminary tests have demonstrated, using the ICESat-2 Science Team's selected land ice and sea ice test cases, the capability of the Algorithms to successfully find and telemeter the surface echoes. In this presentation we will describe the ATLAS Flight Science Receiver Algorithms and the Software Simulator, and will present results of the testing to date. The onboard databases (DEM, DRM and the Surface Reference Mask) are being developed at the University of Texas at Austin as part of the ATLAS Flight Science Receiver Algorithms. Verification of the onboard databases is being performed by ATLAS Receiver Algorithms team members Claudia Carabajal and Jack Saba.
A Probabilistic Atlas of Diffuse WHO Grade II Glioma Locations in the Brain

PubMed Central

Baumann, Cédric; Zouaoui, Sonia; Yordanova, Yordanka; Blonski, Marie; Rigau, Valérie; Chemouny, Stéphane; Taillandier, Luc; Bauchet, Luc; Duffau, Hugues; Paragios, Nikos

2016-01-01

Diffuse WHO grade II gliomas are diffusively infiltrative brain tumors characterized by an unavoidable anaplastic transformation. Their management is strongly dependent on their location in the brain due to interactions with functional regions and potential differences in molecular biology. In this paper, we present the construction of a probabilistic atlas mapping the preferential locations of diffuse WHO grade II gliomas in the brain. This is carried out through a sparse graph whose nodes correspond to clusters of tumors clustered together based on their spatial proximity. The interest of such an atlas is illustrated via two applications. The first one correlates tumor location with the patient’s age via a statistical analysis, highlighting the interest of the atlas for studying the origins and behavior of the tumors. The second exploits the fact that the tumors have preferential locations for automatic segmentation. Through a coupled decomposed Markov Random Field model, the atlas guides the segmentation process, and characterizes which preferential location the tumor belongs to and consequently which behavior it could be associated to. Leave-one-out cross validation experiments on a large database highlight the robustness of the graph, and yield promising segmentation results. PMID:26751577
System-of-Systems Technology-Portfolio-Analysis Tool

NASA Technical Reports Server (NTRS)

O'Neil, Daniel; Mankins, John; Feingold, Harvey; Johnson, Wayne

2012-01-01

Advanced Technology Life-cycle Analysis System (ATLAS) is a system-of-systems technology-portfolio-analysis software tool. ATLAS affords capabilities to (1) compare estimates of the mass and cost of an engineering system based on competing technological concepts; (2) estimate life-cycle costs of an outer-space-exploration architecture for a specified technology portfolio; (3) collect data on state-of-the-art and forecasted technology performance, and on operations and programs; and (4) calculate an index of the relative programmatic value of a technology portfolio. ATLAS facilitates analysis by providing a library of analytical spreadsheet models for a variety of systems. A single analyst can assemble a representation of a system of systems from the models and build a technology portfolio. Each system model estimates mass, and life-cycle costs are estimated by a common set of cost models. Other components of ATLAS include graphical-user-interface (GUI) software, algorithms for calculating the aforementioned index, a technology database, a report generator, and a form generator for creating the GUI for the system models. At the time of this reporting, ATLAS is a prototype, embodied in Microsoft Excel and several thousand lines of Visual Basic for Applications that run on both Windows and Macintosh computers.
Expression atlas and comparative coexpression network analyses reveal important genes involved in the formation of lignified cell wall in Brachypodium distachyon.

PubMed

Sibout, Richard; Proost, Sebastian; Hansen, Bjoern Oest; Vaid, Neha; Giorgi, Federico M; Ho-Yue-Kuang, Severine; Legée, Frédéric; Cézart, Laurent; Bouchabké-Coussa, Oumaya; Soulhat, Camille; Provart, Nicholas; Pasha, Asher; Le Bris, Philippe; Roujol, David; Hofte, Herman; Jamet, Elisabeth; Lapierre, Catherine; Persson, Staffan; Mutwil, Marek

2017-08-01

While Brachypodium distachyon (Brachypodium) is an emerging model for grasses, no expression atlas or gene coexpression network is available. Such tools are of high importance to provide insights into the function of Brachypodium genes. We present a detailed Brachypodium expression atlas, capturing gene expression in its major organs at different developmental stages. The data were integrated into a large-scale coexpression database ( www.gene2function.de), enabling identification of duplicated pathways and conserved processes across 10 plant species, thus allowing genome-wide inference of gene function. We highlight the importance of the atlas and the platform through the identification of duplicated cell wall modules, and show that a lignin biosynthesis module is conserved across angiosperms. We identified and functionally characterised a putative ferulate 5-hydroxylase gene through overexpression of it in Brachypodium, which resulted in an increase in lignin syringyl units and reduced lignin content of mature stems, and led to improved saccharification of the stem biomass. Our Brachypodium expression atlas thus provides a powerful resource to reveal functionally related genes, which may advance our understanding of important biological processes in grasses. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
CellAtlasSearch: a scalable search engine for single cells.

PubMed

Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka

2018-05-21

Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

PubMed

Tuo, Youlin; An, Ning; Zhang, Ming

2018-03-01

The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
Novel signatures of cancer-associated fibroblasts.

PubMed

Bozóky, Benedek; Savchenko, Andrii; Csermely, Péter; Korcsmáros, Tamás; Dúl, Zoltán; Pontén, Fredrik; Székely, László; Klein, George

2013-07-15

Increasing evidence indicates the importance of the tumor microenvironment, in particular cancer-associated fibroblasts, in cancer development and progression. In our study, we developed a novel, visually based method to identify new immunohistochemical signatures of these fibroblasts. The method employed a protein list based on 759 protein products of genes identified by RNA profiling from our previous study, comparing fibroblasts with differential growth-modulating effect on human cancers cells, and their first neighbors in the human protein interactome. These 2,654 proteins were analyzed in the Human Protein Atlas online database by comparing their immunohistochemical expression patterns in normal versus tumor-associated fibroblasts. Twelve new proteins differentially expressed in cancer-associated fibroblasts were identified (DLG1, BHLHE40, ROCK2, RAB31, AZI2, PKM2, ARHGAP31, ARHGAP26, ITCH, EGLN1, RNF19A and PLOD2), four of them can be connected to the Rho kinase signaling pathway. They were further analyzed in several additional tumor stromata and revealed that the majority showed congruence among the different tumors. Many of them were also positive in normal myofibroblast-like cells. The new signatures can be useful in immunohistochemical analysis of different tumor stromata and may also give us an insight into the pathways activated in them in their true in vivo context. The method itself could be used for other similar analysis to identify proteins expressed in other cell types in tumors and their surrounding microenvironment. Copyright © 2013 UICC.
Planning Ahead by Thinking Backwards.

ERIC Educational Resources Information Center

Farmer, Lesley S. J.

1996-01-01

Suggests evaluation criteria for selecting CD-ROMs and describes some typical titles along with examples of learning activities. Highlights include reference titles, including encyclopedias, magazine indexes, newspaper databases, subject-specific indexes, timetables and almanacs, and atlases; and curriculum-specific titles. (LRW)
The ATLAS PanDA Monitoring System and its Evolution

NASA Astrophysics Data System (ADS)

Klimentov, A.; Nevski, P.; Potekhin, M.; Wenaus, T.

2011-12-01

The PanDA (Production and Distributed Analysis) Workload Management System is used for ATLAS distributed production and analysis worldwide. The needs of ATLAS global computing imposed challenging requirements on the design of PanDA in areas such as scalability, robustness, automation, diagnostics, and usability for both production shifters and analysis users. Through a system-wide job database, the PanDA monitor provides a comprehensive and coherent view of the system and job execution, from high level summaries to detailed drill-down job diagnostics. It is (like the rest of PanDA) an Apache-based Python application backed by Oracle. The presentation layer is HTML code generated on the fly in the Python application which is also responsible for managing database queries. However, this approach is lacking in user interface flexibility, simplicity of communication with external systems, and ease of maintenance. A decision was therefore made to migrate the PanDA monitor server to Django Web Application Framework and apply JSON/AJAX technology in the browser front end. This allows us to greatly reduce the amount of application code, separate data preparation from presentation, leverage open source for tools such as authentication and authorization mechanisms, and provide a richer and more dynamic user experience. We describe our approach, design and initial experience with the migration process.

A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline

PubMed Central

Rudnick, Paul A.; Markey, Sanford P.; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V.; Edwards, Nathan J.; Thangudu, Ratna R.; Ketchum, Karen A.; Kinsinger, Christopher R.; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E.

2016-01-01

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics datasets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and non-reference markers of cancer. The CPTAC labs have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these datasets were produced from 2D LC-MS/MS analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) Peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false discovery rate (FDR)-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the datasets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level (“rolled-up”) precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ™. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data, enabling comparisons between different samples and cancer types as well as across the major ‘omics fields. PMID:26860878
A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline.

PubMed

Rudnick, Paul A; Markey, Sanford P; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V; Edwards, Nathan J; Thangudu, Ratna R; Ketchum, Karen A; Kinsinger, Christopher R; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E

2016-03-04

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.
Distribution of cellular HSV-1 receptor expression in human brain.

PubMed

Lathe, Richard; Haas, Juergen G

2017-06-01

Herpes simplex virus type 1 (HSV-1) is a neurotropic virus linked to a range of acute and chronic neurological disorders affecting distinct regions of the brain. Unusually, HSV-1 entry into cells requires the interaction of viral proteins glycoprotein D (gD) and glycoprotein B (gB) with distinct cellular receptor proteins. Several different gD and gB receptors have been identified, including TNFRSF14/HVEM and PVRL1/nectin 1 as gD receptors and PILRA, MAG, and MYH9 as gB receptors. We investigated the expression of these receptor molecules in different areas of the adult and developing human brain using online transcriptome databases. Whereas all HSV-1 receptors showed distinct expression patterns in different brain areas, the Allan Brain Atlas (ABA) reported increased expression of both gD and gB receptors in the hippocampus. Specifically, for PVRL1, TNFRFS14, and MYH9, the differential z scores for hippocampal expression, a measure of relative levels of increased expression, rose to 2.9, 2.9, and 2.5, respectively, comparable to the z score for the archetypical hippocampus-enriched mineralocorticoid receptor (NR3C2, z = 3.1). These data were confirmed at the Human Brain Transcriptome (HBT) database, but HBT data indicate that MAG expression is also enriched in hippocampus. The HBT database allowed the developmental pattern of expression to be investigated; we report that all HSV1 receptors markedly increase in expression levels between gestation and the postnatal/adult periods. These results suggest that differential receptor expression levels of several HSV-1 gD and gB receptors in the adult hippocampus are likely to underlie the susceptibility of this brain region to HSV-1 infection.
E-MSD: an integrated data resource for bioinformatics.

PubMed

Golovin, A; Oldfield, T J; Tate, J G; Velankar, S; Barton, G J; Boutselakis, H; Dimitropoulos, D; Fillon, J; Hussain, A; Ionides, J M C; John, M; Keller, P A; Krissinel, E; McNeil, P; Naim, A; Newman, R; Pajon, A; Pineda, J; Rachedi, A; Copeland, J; Sitnov, A; Sobhany, S; Suarez-Uruena, A; Swaminathan, G J; Tagari, M; Tromm, S; Vranken, W; Henrick, K

2004-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD 'atlas pages' show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www. ebi.ac.uk/msd-srv/docs/roadshow_tutorial/).
E-MSD: an integrated data resource for bioinformatics

PubMed Central

Golovin, A.; Oldfield, T. J.; Tate, J. G.; Velankar, S.; Barton, G. J.; Boutselakis, H.; Dimitropoulos, D.; Fillon, J.; Hussain, A.; Ionides, J. M. C.; John, M.; Keller, P. A.; Krissinel, E.; McNeil, P.; Naim, A.; Newman, R.; Pajon, A.; Pineda, J.; Rachedi, A.; Copeland, J.; Sitnov, A.; Sobhany, S.; Suarez-Uruena, A.; Swaminathan, G. J.; Tagari, M.; Tromm, S.; Vranken, W.; Henrick, K.

2004-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the Protein Data Bank (PDB) and to work towards the integration of various bioinformatics data resources. We have implemented a simple form-based interface that allows users to query the MSD directly. The MSD ‘atlas pages’ show all of the information in the MSD for a particular PDB entry. The group has designed new search interfaces aimed at specific areas of interest, such as the environment of ligands and the secondary structures of proteins. We have also implemented a novel search interface that begins to integrate separate MSD search services in a single graphical tool. We have worked closely with collaborators to build a new visualization tool that can present both structure and sequence data in a unified interface, and this data viewer is now used throughout the MSD services for the visualization and presentation of search results. Examples showcasing the functionality and power of these tools are available from tutorial webpages (http://www.ebi.ac.uk/msd-srv/docs/roadshow_tutorial/). PMID:14681397
EnviroAtlas - Ecosystem Service Market and Project Areas, U.S., 2015, Forest Trends' Ecosystem Marketplace

EPA Pesticide Factsheets

This EnviroAtlas dataset contains polygons depicting the geographic areas of market-based programs, referred to herein as markets, and projects addressing ecosystem services protection in the United States. Depending upon the type of market or project and data availability, polygons reflect market coverage areas, project footprints, or project primary impact areas in which ecosystem service markets and projects operate. The data were collected via surveys and desk research conducted by Forest Trends' Ecosystem Marketplace from 2008 to 2016 on biodiversity (i.e., imperiled species/habitats; wetlands and streams), carbon, and water markets. Additional biodiversity data were obtained from the Regulatory In-lieu Fee and Bank Information Tracking System (RIBITS) database in 2015. Attribute data include information regarding the methodology, design, and development of biodiversity, carbon, and water markets and projects. This dataset was produced by Forest Trends' Ecosystem Marketplace for EnviroAtlas in order to support public access to and use of information related to environmental markets. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about thi
An atlas of ShakeMaps for selected global earthquakes

USGS Publications Warehouse

Allen, Trevor I.; Wald, David J.; Hotovec, Alicia J.; Lin, Kuo-Wan; Earle, Paul S.; Marano, Kristin D.

2008-01-01

An atlas of maps of peak ground motions and intensity 'ShakeMaps' has been developed for almost 5,000 recent and historical global earthquakes. These maps are produced using established ShakeMap methodology (Wald and others, 1999c; Wald and others, 2005) and constraints from macroseismic intensity data, instrumental ground motions, regional topographically-based site amplifications, and published earthquake-rupture models. Applying the ShakeMap methodology allows a consistent approach to combine point observations with ground-motion predictions to produce descriptions of peak ground motions and intensity for each event. We also calculate an estimated ground-motion uncertainty grid for each earthquake. The Atlas of ShakeMaps provides a consistent and quantitative description of the distribution and intensity of shaking for recent global earthquakes (1973-2007) as well as selected historic events. As such, the Atlas was developed specifically for calibrating global earthquake loss estimation methodologies to be used in the U.S. Geological Survey Prompt Assessment of Global Earthquakes for Response (PAGER) Project. PAGER will employ these loss models to rapidly estimate the impact of global earthquakes as part of the USGS National Earthquake Information Center's earthquake-response protocol. The development of the Atlas of ShakeMaps has also led to several key improvements to the Global ShakeMap system. The key upgrades include: addition of uncertainties in the ground motion mapping, introduction of modern ground-motion prediction equations, improved estimates of global seismic-site conditions (VS30), and improved definition of stable continental region polygons. Finally, we have merged all of the ShakeMaps in the Atlas to provide a global perspective of earthquake ground shaking for the past 35 years, allowing comparison with probabilistic hazard maps. The online Atlas and supporting databases can be found at http://earthquake.usgs.gov/eqcenter/shakemap/atlas.php/.
FACETS: multi-faceted functional decomposition of protein interaction networks.

PubMed

Seah, Boon-Siew; Bhowmick, Sourav S; Dewey, C Forbes

2012-10-15

The availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein-protein interaction (PPI) network using graph theoretic analysis. Despite the recent progress, systems level analysis of high-throughput PPIs remains a daunting task because of the amount of data they present. In this article, we propose a novel PPI network decomposition algorithm called FACETS in order to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. FACETS finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity. We tested our algorithm on the global networks from IntAct, and compared it with gold standard datasets from MIPS and KEGG. We demonstrated the performance of FACETS. We also performed a case study that illustrates the utility of our approach. Supplementary data are available at the Bioinformatics online. Our software is available freely for non-commercial purposes from: http://www.cais.ntu.edu.sg/~assourav/Facets/
Evolution of grid-wide access to database resident information in ATLAS using Frontier

NASA Astrophysics Data System (ADS)

Barberis, D.; Bujor, F.; de Stefano, J.; Dewhurst, A. L.; Dykstra, D.; Front, D.; Gallas, E.; Gamboa, C. F.; Luehring, F.; Walker, R.

2012-12-01

The ATLAS experiment deployed Frontier technology worldwide during the initial year of LHC collision data taking to enable user analysis jobs running on the Worldwide LHC Computing Grid to access database resident data. Since that time, the deployment model has evolved to optimize resources, improve performance, and streamline maintenance of Frontier and related infrastructure. In this presentation we focus on the specific changes in the deployment and improvements undertaken, such as the optimization of cache and launchpad location, the use of RPMs for more uniform deployment of underlying Frontier related components, improvements in monitoring, optimization of fail-over, and an increasing use of a centrally managed database containing site specific information (for configuration of services and monitoring). In addition, analysis of Frontier logs has allowed us a deeper understanding of problematic queries and understanding of use cases. Use of the system has grown beyond user analysis and subsystem specific tasks such as calibration and alignment, extending into production processing areas, such as initial reconstruction and trigger reprocessing. With a more robust and tuned system, we are better equipped to satisfy the still growing number of diverse clients and the demands of increasingly sophisticated processing and analysis.
Design characteristics that affect speed of information access and clarity of presentation in an electronic neuroanatomy atlas.

PubMed

Stewart, P A; Nathan, N; Nyhof-Young, J

2007-01-01

Functional Neuroanatomy, an interactive electronic neuroanatomical atlas, was designed for first year medical students. Medical students have much to learn in a limited time; therefore a major goal in the atlas design was that it facilitate rapid, accurate information retrieval. To assess this feature, we designed a testing scenario in which students who had never taken a neuroanatomy course were asked to complete two equivalent tests, one using the electronic atlas and one using a comparable hard copy atlas, in a limited period of time. The tests were too long to be completed in the time allotted, so test scores were measures of how quickly correct information could be retrieved from each source. Statistical analysis of the data showed that the tests were of equal difficulty and that accurate information retrieval was significantly faster using the electronic atlas when compared with the hard copy atlas (P < 0.0001). Post-test focus groups (n = 4) allowed us to infer that the following design features contributed to rapid information access: the number of structures in the database was limited to those that are relevant to a practicing physician; all of the program modules were presented in both text and image form on the index screen, which doubled as a site map; pages were layered electronically such that information was hidden until requested, structures available on each page were listed alphabetically and could be accessed by clicking on their name; and an illustrated glossary was provided and equipped with a search engine.
Organic dyes in illuminated manuscripts: a unique cultural and historic record

PubMed Central

Nabais, Paula; Guimarães, Maria; Araújo, Rita; Whitworth, Isabella

2016-01-01

In this study, we successfully addressed the challenges posed by the identification of dyes in medieval illuminations. Brazilwood pigment lakes and orcein purple colours were unequivocally identified in illuminated manuscripts dated by art historians to be from the thirteenth to the fifteenth centuries and in the Fernão Vaz Dourado Atlas (sixteenth century). All three works were on a parchment support. This was possible by combining Raman microscopy and surface-enhanced Raman spectroscopy with microspectrofluorimetry. To the best of our knowledge, this is the first time that brazilein, the main chromophore in brazilwood lake pigments, has been unequivocally identified by surface-enhanced Raman spectroscopy in an illuminated work (the Dourado Atlas). Complementing this identification, through microspectrofluorimetry and micro-Fourier transform infrared spectroscopy, it was possible to propose a complete paint formulation by comparison with our database of references; the dark pink hues, in the three case studies, were produced by combining brazilwood pigment lakes and gypsum in a protein- and gum arabic-based tempera. Orcein purple, also known as orchil dye, has been previously identified in medieval manuscripts, dated from the sixth to the ninth centuries. Our findings in fourteenth–sixteenth century manuscripts confirm the hypothesis that this dye was lost during the High Middle Ages, to be later rediscovered. This article is part of the themed issue ‘Raman spectroscopy in art and archaeology’. PMID:27799433
Organic dyes in illuminated manuscripts: a unique cultural and historic record

NASA Astrophysics Data System (ADS)

Melo, Maria João; Nabais, Paula; Guimarães, Maria; Araújo, Rita; Castro, Rita; Oliveira, Maria Conceição; Whitworth, Isabella

2016-12-01

In this study, we successfully addressed the challenges posed by the identification of dyes in medieval illuminations. Brazilwood pigment lakes and orcein purple colours were unequivocally identified in illuminated manuscripts dated by art historians to be from the thirteenth to the fifteenth centuries and in the Fernão Vaz Dourado Atlas (sixteenth century). All three works were on a parchment support. This was possible by combining Raman microscopy and surface-enhanced Raman spectroscopy with microspectrofluorimetry. To the best of our knowledge, this is the first time that brazilein, the main chromophore in brazilwood lake pigments, has been unequivocally identified by surface-enhanced Raman spectroscopy in an illuminated work (the Dourado Atlas). Complementing this identification, through microspectrofluorimetry and micro-Fourier transform infrared spectroscopy, it was possible to propose a complete paint formulation by comparison with our database of references; the dark pink hues, in the three case studies, were produced by combining brazilwood pigment lakes and gypsum in a protein- and gum arabic-based tempera. Orcein purple, also known as orchil dye, has been previously identified in medieval manuscripts, dated from the sixth to the ninth centuries. Our findings in fourteenth-sixteenth century manuscripts confirm the hypothesis that this dye was lost during the High Middle Ages, to be later rediscovered. This article is part of the themed issue "Raman spectroscopy in art and archaeology".
Discriminative dictionary learning for abdominal multi-organ segmentation.

PubMed

Tong, Tong; Wolz, Robin; Wang, Zehan; Gao, Qinquan; Misawa, Kazunari; Fujiwara, Michitaka; Mori, Kensaku; Hajnal, Joseph V; Rueckert, Daniel

2015-07-01

An automated segmentation method is presented for multi-organ segmentation in abdominal CT images. Dictionary learning and sparse coding techniques are used in the proposed method to generate target specific priors for segmentation. The method simultaneously learns dictionaries which have reconstructive power and classifiers which have discriminative ability from a set of selected atlases. Based on the learnt dictionaries and classifiers, probabilistic atlases are then generated to provide priors for the segmentation of unseen target images. The final segmentation is obtained by applying a post-processing step based on a graph-cuts method. In addition, this paper proposes a voxel-wise local atlas selection strategy to deal with high inter-subject variation in abdominal CT images. The segmentation performance of the proposed method with different atlas selection strategies are also compared. Our proposed method has been evaluated on a database of 150 abdominal CT images and achieves a promising segmentation performance with Dice overlap values of 94.9%, 93.6%, 71.1%, and 92.5% for liver, kidneys, pancreas, and spleen, respectively. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Charting molecular free-energy landscapes with an atlas of collective variables

NASA Astrophysics Data System (ADS)

Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino

2016-11-01

Collective variables (CVs) are a fundamental tool to understand molecular flexibility, to compute free energy landscapes, and to enhance sampling in molecular dynamics simulations. However, identifying suitable CVs is challenging, and is increasingly addressed with systematic data-driven manifold learning techniques. Here, we provide a flexible framework to model molecular systems in terms of a collection of locally valid and partially overlapping CVs: an atlas of CVs. The specific motivation for such a framework is to enhance the applicability and robustness of CVs based on manifold learning methods, which fail in the presence of periodicities in the underlying conformational manifold. More generally, using an atlas of CVs rather than a single chart may help us better describe different regions of conformational space. We develop the statistical mechanics foundation for our multi-chart description and propose an algorithmic implementation. The resulting atlas of data-based CVs are then used to enhance sampling and compute free energy surfaces in two model systems, alanine dipeptide and β-D-glucopyranose, whose conformational manifolds have toroidal and spherical topologies.
Two-stage atlas subset selection in multi-atlas based image segmentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu

2015-06-15

Purpose: Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. Methods: An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stagemore » atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. Results: The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. Conclusions: The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.« less
MSD-MAP: A Network-Based Systems Biology Platform for Predicting Disease-Metabolite Links.

PubMed

Wathieu, Henri; Issa, Naiem T; Mohandoss, Manisha; Byers, Stephen W; Dakshanamurthy, Sivanesan

2017-01-01

Cancer-associated metabolites result from cell-wide mechanisms of dysregulation. The field of metabolomics has sought to identify these aberrant metabolites as disease biomarkers, clues to understanding disease mechanisms, or even as therapeutic agents. This study was undertaken to reliably predict metabolites associated with colorectal, esophageal, and prostate cancers. Metabolite and disease biological action networks were compared in a computational platform called MSD-MAP (Multi Scale Disease-Metabolite Association Platform). Using differential gene expression analysis with patient-based RNAseq data from The Cancer Genome Atlas, genes up- or down-regulated in cancer compared to normal tissue were identified. Relational databases were used to map biological entities including pathways, functions, and interacting proteins, to those differential disease genes. Similar relational maps were built for metabolites, stemming from known and in silico predicted metabolite-protein associations. The hypergeometric test was used to find statistically significant relationships between disease and metabolite biological signatures at each tier, and metabolites were assessed for multi-scale association with each cancer. Metabolite networks were also directly associated with various other diseases using a disease functional perturbation database. Our platform recapitulated metabolite-disease links that have been empirically verified in the scientific literature, with network-based mapping of jointly-associated biological activity also matching known disease mechanisms. This was true for colorectal, esophageal, and prostate cancers, using metabolite action networks stemming from both predicted and known functional protein associations. By employing systems biology concepts, MSD-MAP reliably predicted known cancermetabolite links, and may serve as a predictive tool to streamline conventional metabolomic profiling methodologies. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Visualization of historical data for the ATLAS detector controls - DDV

NASA Astrophysics Data System (ADS)

Maciejewski, J.; Schlenker, S.

2017-10-01

The ATLAS experiment is one of four detectors located on the Large Hardon Collider (LHC) based at CERN. Its detector control system (DCS) stores the slow control data acquired within the back-end of distributed WinCC OA applications, which enables the data to be retrieved for future analysis, debugging and detector development in an Oracle relational database. The ATLAS DCS Data Viewer (DDV) is a client-server application providing access to the historical data outside of the experiment network. The server builds optimized SQL queries, retrieves the data from the database and serves it to the clients via HTTP connections. The server also implements protection methods to prevent malicious use of the database. The client is an AJAX-type web application based on the Vaadin (framework build around the Google Web Toolkit (GWT)) which gives users the possibility to access the data with ease. The DCS metadata can be selected using a column-tree navigation or a search engine supporting regular expressions. The data is visualized by a selection of output modules such as a java script value-over time plots or a lazy loading table widget. Additional plugins give the users the possibility to retrieve the data in ROOT format or as an ASCII file. Control system alarms can also be visualized in a dedicated table if necessary. Python mock-up scripts can be generated by the client, allowing the user to query the pythonic DDV server directly, such that the users can embed the scripts into more complex analysis programs. Users are also able to store searches and output configurations as XML on the server to share with others via URL or to embed in HTML.
The Eclipsing Binary On-Line Atlas (EBOLA)

NASA Astrophysics Data System (ADS)

Bradstreet, D. H.; Steelman, D. P.; Sanders, S. J.; Hargis, J. R.

2004-05-01

In conjunction with the upcoming release of \\it Binary Maker 3.0, an extensive on-line database of eclipsing binaries is being made available. The purposes of the atlas are: \\begin {enumerate} Allow quick and easy access to information on published eclipsing binaries. Amass a consistent database of light and radial velocity curve solutions to aid in solving new systems. Provide invaluable querying capabilities on all of the parameters of the systems so that informative research can be quickly accomplished on a multitude of published results. Aid observers in establishing new observing programs based upon stars needing new light and/or radial velocity curves. Encourage workers to submit their published results so that others may have easy access to their work. Provide a vast but easily accessible storehouse of information on eclipsing binaries to accelerate the process of understanding analysis techniques and current work in the field. \\end {enumerate} The database will eventually consist of all published eclipsing binaries with light curve solutions. The following information and data will be supplied whenever available for each binary: original light curves in all bandpasses, original radial velocity observations, light curve parameters, RA and Dec, V-magnitudes, spectral types, color indices, periods, binary type, 3D representation of the system near quadrature, plots of the original light curves and synthetic models, plots of the radial velocity observations with theoretical models, and \\it Binary Maker 3.0 data files (parameter, light curve, radial velocity). The pertinent references for each star are also given with hyperlinks directly to the papers via the NASA Abstract website for downloading, if available. In addition the Atlas has extensive searching options so that workers can specifically search for binaries with specific characteristics. The website has more than 150 systems already uploaded. The URL for the site is http://ebola.eastern.edu/.
Dcs Data Viewer, an Application that Accesses ATLAS DCS Historical Data

NASA Astrophysics Data System (ADS)

Tsarouchas, C.; Schlenker, S.; Dimitrov, G.; Jahn, G.

2014-06-01

The ATLAS experiment at CERN is one of the four Large Hadron Collider experiments. The Detector Control System (DCS) of ATLAS is responsible for the supervision of the detector equipment, the reading of operational parameters, the propagation of the alarms and the archiving of important operational data in a relational database (DB). DCS Data Viewer (DDV) is an application that provides access to the ATLAS DCS historical data through a web interface. Its design is structured using a client-server architecture. The pythonic server connects to the DB and fetches the data by using optimized SQL requests. It communicates with the outside world, by accepting HTTP requests and it can be used stand alone. The client is an AJAX (Asynchronous JavaScript and XML) interactive web application developed under the Google Web Toolkit (GWT) framework. Its web interface is user friendly, platform and browser independent. The selection of metadata is done via a column-tree view or with a powerful search engine. The final visualization of the data is done using java applets or java script applications as plugins. The default output is a value-over-time chart, but other types of outputs like tables, ascii or ROOT files are supported too. Excessive access or malicious use of the database is prevented by a dedicated protection mechanism, allowing the exposure of the tool to hundreds of inexperienced users. The current configuration of the client and of the outputs can be saved in an XML file. Protection against web security attacks is foreseen and authentication constrains have been taken into account, allowing the exposure of the tool to hundreds of users world wide. Due to its flexible interface and its generic and modular approach, DDV could be easily used for other experiment control systems.
SU-E-J-132: Automated Segmentation with Post-Registration Atlas Selection Based On Mutual Information

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, X; Gao, H; Sharp, G

2015-06-15

Purpose: The delineation of targets and organs-at-risk is a critical step during image-guided radiation therapy, for which manual contouring is the gold standard. However, it is often time-consuming and may suffer from intra- and inter-rater variability. The purpose of this work is to investigate the automated segmentation. Methods: The automatic segmentation here is based on mutual information (MI), with the atlas from Public Domain Database for Computational Anatomy (PDDCA) with manually drawn contours.Using dice coefficient (DC) as the quantitative measure of segmentation accuracy, we perform leave-one-out cross-validations for all PDDCA images sequentially, during which other images are registered to eachmore » chosen image and DC is computed between registered contour and ground truth. Meanwhile, six strategies, including MI, are selected to measure the image similarity, with MI to be the best. Then given a target image to be segmented and an atlas, automatic segmentation consists of: (a) the affine registration step for image positioning; (b) the active demons registration method to register the atlas to the target image; (c) the computation of MI values between the deformed atlas and the target image; (d) the weighted image fusion of three deformed atlas images with highest MI values to form the segmented contour. Results: MI was found to be the best among six studied strategies in the sense that it had the highest positive correlation between similarity measure (e.g., MI values) and DC. For automated segmentation, the weighted image fusion of three deformed atlas images with highest MI values provided the highest DC among four proposed strategies. Conclusion: MI has the highest correlation with DC, and therefore is an appropriate choice for post-registration atlas selection in atlas-based segmentation. Xuhua Ren and Hao Gao were partially supported by the NSFC (#11405105), the 973 Program (#2015CB856000) and the Shanghai Pujiang Talent Program (#14PJ1404500)« less

Atlas-based fuzzy connectedness segmentation and intensity nonuniformity correction applied to brain MRI.

PubMed

Zhou, Yongxin; Bai, Jing

2007-01-01

A framework that combines atlas registration, fuzzy connectedness (FC) segmentation, and parametric bias field correction (PABIC) is proposed for the automatic segmentation of brain magnetic resonance imaging (MRI). First, the atlas is registered onto the MRI to initialize the following FC segmentation. Original techniques are proposed to estimate necessary initial parameters of FC segmentation. Further, the result of the FC segmentation is utilized to initialize a following PABIC algorithm. Finally, we re-apply the FC technique on the PABIC corrected MRI to get the final segmentation. Thus, we avoid expert human intervention and provide a fully automatic method for brain MRI segmentation. Experiments on both simulated and real MRI images demonstrate the validity of the method, as well as the limitation of the method. Being a fully automatic method, it is expected to find wide applications, such as three-dimensional visualization, radiation therapy planning, and medical database construction.
CAVIAR: CLASSIFICATION VIA AGGREGATED REGRESSION AND ITS APPLICATION IN CLASSIFYING OASIS BRAIN DATABASE

PubMed Central

Chen, Ting; Rangarajan, Anand; Vemuri, Baba C.

2010-01-01

This paper presents a novel classification via aggregated regression algorithm – dubbed CAVIAR – and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature – the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain – and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS. PMID:21151847
CAVIAR: CLASSIFICATION VIA AGGREGATED REGRESSION AND ITS APPLICATION IN CLASSIFYING OASIS BRAIN DATABASE.

PubMed

Chen, Ting; Rangarajan, Anand; Vemuri, Baba C

2010-04-14

This paper presents a novel classification via aggregated regression algorithm - dubbed CAVIAR - and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on both the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the testing stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature - the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain - and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS.
Structural atlas of dynein motors at atomic resolution.

PubMed

Toda, Akiyuki; Tanaka, Hideaki; Kurisu, Genji

2018-04-01

Dynein motors are biologically important bio-nanomachines, and many atomic resolution structures of cytoplasmic dynein components from different organisms have been analyzed by X-ray crystallography, cryo-EM, and NMR spectroscopy. This review provides a historical perspective of structural studies of cytoplasmic and axonemal dynein including accessory proteins. We describe representative structural studies of every component of dynein and summarize them as a structural atlas that classifies the cytoplasmic and axonemal dyneins. Based on our review of all dynein structures in the Protein Data Bank, we raise two important points for understanding the two types of dynein motor and discuss the potential prospects of future structural studies.
Multi-atlas and label fusion approach for patient-specific MRI based skull estimation.

PubMed

Torrado-Carvajal, Angel; Herraiz, Joaquin L; Hernandez-Tamames, Juan A; San Jose-Estepar, Raul; Eryaman, Yigitcan; Rozenholc, Yves; Adalsteinsson, Elfar; Wald, Lawrence L; Malpica, Norberto

2016-04-01

MRI-based skull segmentation is a useful procedure for many imaging applications. This study describes a methodology for automatic segmentation of the complete skull from a single T1-weighted volume. The skull is estimated using a multi-atlas segmentation approach. Using a whole head computed tomography (CT) scan database, the skull in a new MRI volume is detected by nonrigid image registration of the volume to every CT, and combination of the individual segmentations by label-fusion. We have compared Majority Voting, Simultaneous Truth and Performance Level Estimation (STAPLE), Shape Based Averaging (SBA), and the Selective and Iterative Method for Performance Level Estimation (SIMPLE) algorithms. The pipeline has been evaluated quantitatively using images from the Retrospective Image Registration Evaluation database (reaching an overlap of 72.46 ± 6.99%), a clinical CT-MR dataset (maximum overlap of 78.31 ± 6.97%), and a whole head CT-MRI pair (maximum overlap 78.68%). A qualitative evaluation has also been performed on MRI acquisition of volunteers. It is possible to automatically segment the complete skull from MRI data using a multi-atlas and label fusion approach. This will allow the creation of complete MRI-based tissue models that can be used in electromagnetic dosimetry applications and attenuation correction in PET/MR. © 2015 Wiley Periodicals, Inc.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PubMed

Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed

2016-01-01

The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
The human secretome atlas initiative: Implications in health and disease conditions

PubMed Central

Brown, Kristy J; Seol, Haeri; Pillai, Dinesh K; Sankoorikal, Binu-John; Formolo, Catherine A; Mac, Jenny; Edwards, Nathan J.; Rose, Mary C; Hathout, Yetrib

2013-01-01

Proteomic analysis of human body fluids is highly challenging, therefore many researchers are redirecting efforts towards secretome profiling. The goal is to define potential biomarkers and therapeutic targets in the secretome that can be traced back in accessible human body fluids. However, currently there is a lack of secretome profiles of normal human primary cells making it difficult to assess the biological meaning of current findings. In this study we sought to establish secretome profiles of human primary cells obtained from healthy donors with the goal of building a human secretome atlas. Such an atlas can be used as a reference for discovery of potential disease associated biomarkers and eventually novel therapeutic targets. As a preliminary study, secretome profiles were established for six different types of human primary cell cultures and checked for overlaps with the three major human body fluids including plasma, cerebrospinal fluid and urine. About 67% of the 1054 identified proteins in the secretome of these primary cells occurred in at least one body fluid. Furthermore, comparison of the secretome profiles of two human glioblastoma cell lines to this new human secretome atlas enabled unambiguous identification of potential brain tumor biomarkers. These biomarkers can be easily monitored in different body fluids using stable isotope labeled standard proteins. The long term goal of this study is to establish a comprehensive online human secretome atlas for future use as a reference for any disease related secretome study. PMID:23603790
Clinical evaluation of multi-atlas based segmentation of lymph node regions in head and neck and prostate cancer patients.

PubMed

Sjöberg, Carl; Lundmark, Martin; Granberg, Christoffer; Johansson, Silvia; Ahnesjö, Anders; Montelius, Anders

2013-10-03

Semi-automated segmentation using deformable registration of selected atlas cases consisting of expert segmented patient images has been proposed to facilitate the delineation of lymph node regions for three-dimensional conformal and intensity-modulated radiotherapy planning of head and neck and prostate tumours. Our aim is to investigate if fusion of multiple atlases will lead to clinical workload reductions and more accurate segmentation proposals compared to the use of a single atlas segmentation, due to a more complete representation of the anatomical variations. Atlases for lymph node regions were constructed using 11 head and neck patients and 15 prostate patients based on published recommendations for segmentations. A commercial registration software (Velocity AI) was used to create individual segmentations through deformable registration. Ten head and neck patients, and ten prostate patients, all different from the atlas patients, were randomly chosen for the study from retrospective data. Each patient was first delineated three times, (a) manually by a radiation oncologist, (b) automatically using a single atlas segmentation proposal from a chosen atlas and (c) automatically by fusing the atlas proposals from all cases in the database using the probabilistic weighting fusion algorithm. In a subsequent step a radiation oncologist corrected the segmentation proposals achieved from step (b) and (c) without using the result from method (a) as reference. The time spent for editing the segmentations was recorded separately for each method and for each individual structure. Finally, the Dice Similarity Coefficient and the volume of the structures were used to evaluate the similarity between the structures delineated with the different methods. For the single atlas method, the time reduction compared to manual segmentation was 29% and 23% for head and neck and pelvis lymph nodes, respectively, while editing the fused atlas proposal resulted in time reductions of 49% and 34%. The average volume of the fused atlas proposals was only 74% of the manual segmentation for the head and neck cases and 82% for the prostate cases due to a blurring effect from the fusion process. After editing of the proposals the resulting volume differences were no longer statistically significant, although a slight influence by the proposals could be noticed since the average edited volume was still slightly smaller than the manual segmentation, 9% and 5%, respectively. Segmentation based on fusion of multiple atlases reduces the time needed for delineation of lymph node regions compared to the use of a single atlas segmentation. Even though the time saving is large, the quality of the segmentation is maintained compared to manual segmentation.
Catalog of infrared observations. Part 2: Appendixes

NASA Technical Reports Server (NTRS)

Gezari, Daniel Y.; Schmitz, Marion; Mead, Jaylee M.

1987-01-01

The Catalog of Infrared Observations (CIO) is a compilation of infrared astronomical observational data obtained from an extensive literature search of astronomical journals and major astronomical catalogs and surveys. The literature searches are complete for years 1965 to 1986. Supporting appendixes are published in this part. The appendices include an atlas of infrared source positions, two bibliographies of infrared literature upon which the search was based, and, keyed to the main Catalog listings (organized alphabetically by first author, and by date), an atlas of infrared spectral ranges, and IRAS data for the CIO sources. The complete CIO database is available to qualified users in printed microfiche and magnetic tape formats.
The SPM Kinematic Catalogue of Planetary Nebulae

NASA Astrophysics Data System (ADS)

López, J. A.; Richer, M. G.; Riesgo, H.; Steffen, W.; García-Segura, G.; Meaburn, J.; Bryce, M.

The San Pedro Mártir Kinematic Catalogue of Planetary Nebulae aims at providing detailed kinematic information for galactic planetary nebulae (PNe) and bright PNe in the Local Group. The database provides long-slit, Echelle spectra and images where the location of the slits on the nebula are indicated. As a tool to help interpret the 2D line profiles or position-velocity data, an atlas of synthetic emission line spectra accompanies the Catalogue. The atlas has been produced with the code SHAPE and contains synthetic spectra for all the main morphological groups for a wide range of spatial orientations and slit locations over the nebula.
Evolution of Database Replication Technologies for WLCG

NASA Astrophysics Data System (ADS)

Baranowski, Zbigniew; Lobato Pardavila, Lorena; Blaszczyk, Marcin; Dimitrov, Gancho; Canali, Luca

2015-12-01

In this article we summarize several years of experience on database replication technologies used at WLCG and we provide a short review of the available Oracle technologies and their key characteristics. One of the notable changes and improvement in this area in recent past has been the introduction of Oracle GoldenGate as a replacement of Oracle Streams. We report in this article on the preparation and later upgrades for remote replication done in collaboration with ATLAS and Tier 1 database administrators, including the experience from running Oracle GoldenGate in production. Moreover, we report on another key technology in this area: Oracle Active Data Guard which has been adopted in several of the mission critical use cases for database replication between online and offline databases for the LHC experiments.
Evolution of the ATLAS distributed computing system during the LHC long shutdown

NASA Astrophysics Data System (ADS)

Campana, S.; Atlas Collaboration

2014-06-01

The ATLAS Distributed Computing project (ADC) was established in 2007 to develop and operate a framework, following the ATLAS computing model, to enable data storage, processing and bookkeeping on top of the Worldwide LHC Computing Grid (WLCG) distributed infrastructure. ADC development has always been driven by operations and this contributed to its success. The system has fulfilled the demanding requirements of ATLAS, daily consolidating worldwide up to 1 PB of data and running more than 1.5 million payloads distributed globally, supporting almost one thousand concurrent distributed analysis users. Comprehensive automation and monitoring minimized the operational manpower required. The flexibility of the system to adjust to operational needs has been important to the success of the ATLAS physics program. The LHC shutdown in 2013-2015 affords an opportunity to improve the system in light of operational experience and scale it to cope with the demanding requirements of 2015 and beyond, most notably a much higher trigger rate and event pileup. We will describe the evolution of the ADC software foreseen during this period. This includes consolidating the existing Production and Distributed Analysis framework (PanDA) and ATLAS Grid Information System (AGIS), together with the development and commissioning of next generation systems for distributed data management (DDM/Rucio) and production (Prodsys-2). We will explain how new technologies such as Cloud Computing and NoSQL databases, which ATLAS investigated as R&D projects in past years, will be integrated in production. Finally, we will describe more fundamental developments such as breaking job-to-data locality by exploiting storage federations and caches, and event level (rather than file or dataset level) workload engines.
The Coral Triangle Atlas: an integrated online spatial database system for improving coral reef management.

PubMed

Cros, Annick; Ahamad Fatan, Nurulhuda; White, Alan; Teoh, Shwu Jiau; Tan, Stanley; Handayani, Christian; Huang, Charles; Peterson, Nate; Venegas Li, Ruben; Siry, Hendra Yusran; Fitriana, Ria; Gove, Jamison; Acoba, Tomoko; Knight, Maurice; Acosta, Renerio; Andrew, Neil; Beare, Doug

2014-01-01

In this paper we describe the construction of an online GIS database system, hosted by WorldFish, which stores bio-physical, ecological and socio-economic data for the 'Coral Triangle Area' in South-east Asia and the Pacific. The database has been built in partnership with all six (Timor-Leste, Malaysia, Indonesia, The Philippines, Solomon Islands and Papua New Guinea) of the Coral Triangle countries, and represents a valuable source of information for natural resource managers at the regional scale. Its utility is demonstrated using biophysical data, data summarising marine habitats, and data describing the extent of marine protected areas in the region.
Nucleic Acid Database (NDB)

Science.gov Websites

the NDB archive or in the Non-Redundant list Advanced Search Search for structures based on structural features, chemical features, binding modes, citation and experimental information Featured Tools RNA 3D Motif Atlas, a representative collection of RNA 3D internal and hairpin loop motifs Non-redundant Lists
Matching the Diversity of Sulfated Biomolecules: Creation of a Classification Database for Sulfatases Reflecting Their Substrate Specificity

PubMed Central

Barbeyron, Tristan; Brillet-Guéguen, Loraine; Carré, Wilfrid; Carrière, Cathelène; Caron, Christophe; Czjzek, Mirjam; Hoebeke, Mark; Michel, Gurvan

2016-01-01

Sulfatases cleave sulfate groups from various molecules and constitute a biologically and industrially important group of enzymes. However, the number of sulfatases whose substrate has been characterized is limited in comparison to the huge diversity of sulfated compounds, yielding functional annotations of sulfatases particularly prone to flaws and misinterpretations. In the context of the explosion of genomic data, a classification system allowing a better prediction of substrate specificity and for setting the limit of functional annotations is urgently needed for sulfatases. Here, after an overview on the diversity of sulfated compounds and on the known sulfatases, we propose a classification database, SulfAtlas (http://abims.sb-roscoff.fr/sulfatlas/), based on sequence homology and composed of four families of sulfatases. The formylglycine-dependent sulfatases, which constitute the largest family, are also divided by phylogenetic approach into 73 subfamilies, each subfamily corresponding to either a known specificity or to an uncharacterized substrate. SulfAtlas summarizes information about the different families of sulfatases. Within a family a web page displays the list of its subfamilies (when they exist) and the list of EC numbers. The family or subfamily page shows some descriptors and a table with all the UniProt accession numbers linked to the databases UniProt, ExplorEnz, and PDB. PMID:27749924
Digital atlas of the upper Washita River basin, southwestern Oklahoma

USGS Publications Warehouse

Becker, Carol J.; Masoner, Jason R.; Scott, Jonathon C.

2008-01-01

Numerous types of environmental data have been collected in the upper Washita River basin in southwestern Oklahoma. However, to date these data have not been compiled into a format that can be comprehensively queried for the purpose of evaluating the effects of various conservation practices implemented to reduce agricultural runoff and erosion in parts of the upper Washita River basin. This U.S. Geological Survey publication, 'Digital atlas of the upper Washita River basin, southwestern Oklahoma' was created to assist with environmental analysis. This atlas contains 30 spatial data sets that can be used in environmental assessment and decision making for the upper Washita River basin. This digital atlas includes U.S. Geological Survey sampling sites and associated water-quality, biological, water-level, and streamflow data collected from 1903 to 2005. The data were retrieved from the U.S. Geological Survey National Water Information System database on September 29, 2005. Data sets are from the Geology, Geography, and Water disciplines of the U.S. Geological Survey and cover parts of Beckham, Caddo, Canadian, Comanche, Custer, Dewey, Grady, Kiowa, and Washita Counties in southwestern Oklahoma. A bibliography of past reports from the U.S. Geological Survey and other State and Federal agencies from 1949 to 2004 is included in the atlas. Additionally, reports by Becker (2001), Martin (2002), Fairchild and others (2004), and Miller and Stanley (2005) are provided in electronic format.
EnviroAtlas - Ecosystem Service Market and Project Locations, U.S., 2015, Forest Trends' Ecosystem Marketplace

EPA Pesticide Factsheets

This EnviroAtlas dataset contains points depicting the location of market-based programs, referred to herein as markets, and projects addressing ecosystem services protection in the United States. The data were collected via surveys and desk research conducted by Forest Trends' Ecosystem Marketplace from 2008 to 2016 on biodiversity (i.e., imperiled species/habitats; wetlands and streams), carbon, and water markets. Additional biodiversity data were obtained from the Regulatory In-lieu Fee and Bank Information Tracking System (RIBITS) database in 2015. Points represent the centroids (i.e., center points) of market coverage areas, project footprints, or project primary impact areas in which ecosystem service markets or projects operate. National-level markets are an exception to this norm with points representing administrative headquarters locations. Attribute data include information regarding the methodology, design, and development of biodiversity, carbon, and water markets and projects. This dataset was produced by Forest Trends' Ecosystem Marketplace for EnviroAtlas in order to support public access to and use of information related to environmental markets. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) o
Multi-atlas attenuation correction supports full quantification of static and dynamic brain PET data in PET-MR

NASA Astrophysics Data System (ADS)

Mérida, Inés; Reilhac, Anthonin; Redouté, Jérôme; Heckemann, Rolf A.; Costes, Nicolas; Hammers, Alexander

2017-04-01

In simultaneous PET-MR, attenuation maps are not directly available. Essential for absolute radioactivity quantification, they need to be derived from MR or PET data to correct for gamma photon attenuation by the imaged object. We evaluate a multi-atlas attenuation correction method for brain imaging (MaxProb) on static [18F]FDG PET and, for the first time, on dynamic PET, using the serotoninergic tracer [18F]MPPF. A database of 40 MR/CT image pairs (atlases) was used. The MaxProb method synthesises subject-specific pseudo-CTs by registering each atlas to the target subject space. Atlas CT intensities are then fused via label propagation and majority voting. Here, we compared these pseudo-CTs with the real CTs in a leave-one-out design, contrasting the MaxProb approach with a simplified single-atlas method (SingleAtlas). We evaluated the impact of pseudo-CT accuracy on reconstructed PET images, compared to PET data reconstructed with real CT, at the regional and voxel levels for the following: radioactivity images; time-activity curves; and kinetic parameters (non-displaceable binding potential, BPND). On static [18F]FDG, the mean bias for MaxProb ranged between 0 and 1% for 73 out of 84 regions assessed, and exceptionally peaked at 2.5% for only one region. Statistical parametric map analysis of MaxProb-corrected PET data showed significant differences in less than 0.02% of the brain volume, whereas SingleAtlas-corrected data showed significant differences in 20% of the brain volume. On dynamic [18F]MPPF, most regional errors on BPND ranged from -1 to +3% (maximum bias 5%) for the MaxProb method. With SingleAtlas, errors were larger and had higher variability in most regions. PET quantification bias increased over the duration of the dynamic scan for SingleAtlas, but not for MaxProb. We show that this effect is due to the interaction of the spatial tracer-distribution heterogeneity variation over time with the degree of accuracy of the attenuation maps. This work demonstrates that inaccuracies in attenuation maps can induce bias in dynamic brain PET studies. Multi-atlas attenuation correction with MaxProb enables quantification on hybrid PET-MR scanners, eschewing the need for CT.
Multi-atlas attenuation correction supports full quantification of static and dynamic brain PET data in PET-MR.

PubMed

Mérida, Inés; Reilhac, Anthonin; Redouté, Jérôme; Heckemann, Rolf A; Costes, Nicolas; Hammers, Alexander

2017-04-07

In simultaneous PET-MR, attenuation maps are not directly available. Essential for absolute radioactivity quantification, they need to be derived from MR or PET data to correct for gamma photon attenuation by the imaged object. We evaluate a multi-atlas attenuation correction method for brain imaging (MaxProb) on static [ 18 F]FDG PET and, for the first time, on dynamic PET, using the serotoninergic tracer [ 18 F]MPPF. A database of 40 MR/CT image pairs (atlases) was used. The MaxProb method synthesises subject-specific pseudo-CTs by registering each atlas to the target subject space. Atlas CT intensities are then fused via label propagation and majority voting. Here, we compared these pseudo-CTs with the real CTs in a leave-one-out design, contrasting the MaxProb approach with a simplified single-atlas method (SingleAtlas). We evaluated the impact of pseudo-CT accuracy on reconstructed PET images, compared to PET data reconstructed with real CT, at the regional and voxel levels for the following: radioactivity images; time-activity curves; and kinetic parameters (non-displaceable binding potential, BP ND ). On static [ 18 F]FDG, the mean bias for MaxProb ranged between 0 and 1% for 73 out of 84 regions assessed, and exceptionally peaked at 2.5% for only one region. Statistical parametric map analysis of MaxProb-corrected PET data showed significant differences in less than 0.02% of the brain volume, whereas SingleAtlas-corrected data showed significant differences in 20% of the brain volume. On dynamic [ 18 F]MPPF, most regional errors on BP ND ranged from -1 to +3% (maximum bias 5%) for the MaxProb method. With SingleAtlas, errors were larger and had higher variability in most regions. PET quantification bias increased over the duration of the dynamic scan for SingleAtlas, but not for MaxProb. We show that this effect is due to the interaction of the spatial tracer-distribution heterogeneity variation over time with the degree of accuracy of the attenuation maps. This work demonstrates that inaccuracies in attenuation maps can induce bias in dynamic brain PET studies. Multi-atlas attenuation correction with MaxProb enables quantification on hybrid PET-MR scanners, eschewing the need for CT.
An Oracle-based event index for ATLAS

NASA Astrophysics Data System (ADS)

Gallas, E. J.; Dimitrov, G.; Vasileva, P.; Baranowski, Z.; Canali, L.; Dumitru, A.; Formica, A.; ATLAS Collaboration

2017-10-01

The ATLAS Eventlndex System has amassed a set of key quantities for a large number of ATLAS events into a Hadoop based infrastructure for the purpose of providing the experiment with a number of event-wise services. Collecting this data in one place provides the opportunity to investigate various storage formats and technologies and assess which best serve the various use cases as well as consider what other benefits alternative storage systems provide. In this presentation we describe how the data are imported into an Oracle RDBMS (relational database management system), the services we have built based on this architecture, and our experience with it. We’ve indexed about 26 billion real data events thus far and have designed the system to accommodate future data which has expected rates of 5 and 20 billion events per year. We have found this system offers outstanding performance for some fundamental use cases. In addition, profiting from the co-location of this data with other complementary metadata in ATLAS, the system has been easily extended to perform essential assessments of data integrity and completeness and to identify event duplication, including at what step in processing the duplication occurred.

Task Management in the New ATLAS Production System

NASA Astrophysics Data System (ADS)

De, K.; Golubkov, D.; Klimentov, A.; Potekhin, M.; Vaniachine, A.; Atlas Collaboration

2014-06-01

This document describes the design of the new Production System of the ATLAS experiment at the LHC [1]. The Production System is the top level workflow manager which translates physicists' needs for production level processing and analysis into actual workflows executed across over a hundred Grid sites used globally by ATLAS. As the production workload increased in volume and complexity in recent years (the ATLAS production tasks count is above one million, with each task containing hundreds or thousands of jobs) there is a need to upgrade the Production System to meet the challenging requirements of the next LHC run while minimizing the operating costs. In the new design, the main subsystems are the Database Engine for Tasks (DEFT) and the Job Execution and Definition Interface (JEDI). Based on users' requests, DEFT manages inter-dependent groups of tasks (Meta-Tasks) and generates corresponding data processing workflows. The JEDI component then dynamically translates the task definitions from DEFT into actual workload jobs executed in the PanDA Workload Management System [2]. We present the requirements, design parameters, basics of the object model and concrete solutions utilized in building the new Production System and its components.
Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project.

PubMed

Omenn, Gilbert S; Lane, Lydie; Lundberg, Emma K; Overall, Christopher M; Deutsch, Eric W

2017-12-01

The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 ( https://hupo.org/Guidelines ), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 "missing proteins" (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
Automatic atlas-based three-label cartilage segmentation from MR knee images

PubMed Central

Shan, Liang; Zach, Christopher; Charles, Cecil; Niethammer, Marc

2016-01-01

Osteoarthritis (OA) is the most common form of joint disease and often characterized by cartilage changes. Accurate quantitative methods are needed to rapidly screen large image databases to assess changes in cartilage morphology. We therefore propose a new automatic atlas-based cartilage segmentation method for future automatic OA studies. Atlas-based segmentation methods have been demonstrated to be robust and accurate in brain imaging and therefore also hold high promise to allow for reliable and high-quality segmentations of cartilage. Nevertheless, atlas-based methods have not been well explored for cartilage segmentation. A particular challenge is the thinness of cartilage, its relatively small volume in comparison to surrounding tissue and the difficulty to locate cartilage interfaces – for example the interface between femoral and tibial cartilage. This paper focuses on the segmentation of femoral and tibial cartilage, proposing a multi-atlas segmentation strategy with non-local patch-based label fusion which can robustly identify candidate regions of cartilage. This method is combined with a novel three-label segmentation method which guarantees the spatial separation of femoral and tibial cartilage, and ensures spatial regularity while preserving the thin cartilage shape through anisotropic regularization. Our segmentation energy is convex and therefore guarantees globally optimal solutions. We perform an extensive validation of the proposed method on 706 images of the Pfizer Longitudinal Study. Our validation includes comparisons of different atlas segmentation strategies, different local classifiers, and different types of regularizers. To compare to other cartilage segmentation approaches we validate based on the 50 images of the SKI10 dataset. PMID:25128683
Global differential expression of genes located in the Down Syndrome Critical Region in normal human brain

PubMed Central

Montoya, Julio Cesar; Fajardo, Dianora; Peña, Angela; Sánchez, Adalberto; Domínguez, Martha C; Satizábal, José María

2014-01-01

Background: The information of gene expression obtained from databases, have made possible the extraction and analysis of data related with several molecular processes involving not only in brain homeostasis but its disruption in some neuropathologies; principally in Down syndrome and the Alzheimer disease. Objective: To correlate the levels of transcription of 19 genes located in the Down Syndrome Critical Region (DSCR) with their expression in several substructures of normal human brain. Methods: There were obtained expression profiles of 19 DSCR genes in 42 brain substructures, from gene expression values available at the database of the human brain of the Brain Atlas of the Allen Institute for Brain Sciences", (http://human.brain-map.org/). The co-expression patterns of DSCR genes in brain were calculated by using multivariate statistical methods. Results: Highest levels of gene expression were registered at caudate nucleus, nucleus accumbens and putamen among central areas of cerebral cortex. Increased expression levels of RCAN1 that encode by a protein involved in signal transduction process of the CNS were recorded for PCP4 that participates in the binding to calmodulin and TTC3; a protein that is associated with differentiation of neurons. That previously identified brain structures play a crucial role in the learning process, in different class of memory and in motor skills. Conclusion: The precise regulation of DSCR gene expression is crucial to maintain the brain homeostasis, especially in those areas with high levels of gene expression associated with a remarkable process of learning and cognition. PMID:25767303
Elevation-derived watershed basins and characteristics for major rivers of the conterminous United States

USGS Publications Warehouse

Poppenga, S.K.; Worstell, B.B.

2008-01-01

The U.S. Geological Survey Earth Resources Observation and Science Center Topographic Science Project has developed elevation-derived watershed basins and characteristics for major rivers of the conterminous United States. Watershed basins are delineated upstream from the mouth of major rivers by using the hydrologic connectivity of the Elevation Derivatives for National Applications (EDNA) seamless database. Watershed characteristics are quantified by integrating ancillary geospatial datasets, including land cover, population, slope, and topography, with elevation-derived watershed boundaries. The results are published in an online EDNA Watershed Atlas at http://edna.usgs.gov/watersheds. The atlas serves as a framework for evaluating and analyzing the physical, biological, and anthropogenic status of watersheds.
A multimodal spatiotemporal cardiac motion atlas from MR and ultrasound data.

PubMed

Puyol-Antón, Esther; Sinclair, Matthew; Gerber, Bernhard; Amzulescu, Mihaela Silvia; Langet, Hélène; Craene, Mathieu De; Aljabar, Paul; Piro, Paolo; King, Andrew P

2017-08-01

Cardiac motion atlases provide a space of reference in which the motions of a cohort of subjects can be directly compared. Motion atlases can be used to learn descriptors that are linked to different pathologies and which can subsequently be used for diagnosis. To date, all such atlases have been formed and applied using data from the same modality. In this work we propose a framework to build a multimodal cardiac motion atlas from 3D magnetic resonance (MR) and 3D ultrasound (US) data. Such an atlas will benefit from the complementary motion features derived from the two modalities, and furthermore, it could be applied in clinics to detect cardiovascular disease using US data alone. The processing pipeline for the formation of the multimodal motion atlas initially involves spatial and temporal normalisation of subjects' cardiac geometry and motion. This step was accomplished following a similar pipeline to that proposed for single modality atlas formation. The main novelty of this paper lies in the use of a multi-view algorithm to simultaneously reduce the dimensionality of both the MR and US derived motion data in order to find a common space between both modalities to model their variability. Three different dimensionality reduction algorithms were investigated: principal component analysis, canonical correlation analysis and partial least squares regression (PLS). A leave-one-out cross validation on a multimodal data set of 50 volunteers was employed to quantify the accuracy of the three algorithms. Results show that PLS resulted in the lowest errors, with a reconstruction error of less than 2.3 mm for MR-derived motion data, and less than 2.5 mm for US-derived motion data. In addition, 1000 subjects from the UK Biobank database were used to build a large scale monomodal data set for a systematic validation of the proposed algorithms. Our results demonstrate the feasibility of using US data alone to analyse cardiac function based on a multimodal motion atlas. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
HeLa Nucleic Acid Contamination in The Cancer Genome Atlas Leads to the Misidentification of Human Papillomavirus 18

PubMed Central

Cantalupo, Paul G.; Katz, Joshua P.

2015-01-01

ABSTRACT We searched The Cancer Genome Atlas (TCGA) database for viruses by comparing non-human reads present in transcriptome sequencing (RNA-Seq) and whole-exome sequencing (WXS) data to viral sequence databases. Human papillomavirus 18 (HPV18) is an etiologic agent of cervical cancer, and as expected, we found robust expression of HPV18 genes in cervical cancer samples. In agreement with previous studies, we also found HPV18 transcripts in non-cervical cancer samples, including those from the colon, rectum, and normal kidney. However, in each of these cases, HPV18 gene expression was low, and single-nucleotide variants and positions of genomic alignments matched the integrated portion of HPV18 present in HeLa cells. Chimeric reads that match a known virus-cell junction of HPV18 integrated in HeLa cells were also present in some samples. We hypothesize that HPV18 sequences in these non-cervical samples are due to nucleic acid contamination from HeLa cells. This finding highlights the problems that contamination presents in computational virus detection pipelines. IMPORTANCE Viruses associated with cancer can be detected by searching tumor sequence databases. Several studies involving searches of the TCGA database have reported the presence of HPV18, a known cause of cervical cancer, in a small number of additional cancers, including those of the rectum, kidney, and colon. We have determined that the sequences related to HPV18 in non-cervical samples are due to nucleic acid contamination from HeLa cells. To our knowledge, this is the first report of the misidentification of viruses in next-generation sequencing data of tumors due to contamination with a cancer cell line. These results raise awareness of the difficulty of accurately identifying viruses in human sequence databases. PMID:25631090
Ménière's Disease: A CHEER Database Study of Local and Regional Patient Encounter and Procedure Patterns.

PubMed

Crowson, Matthew G; Schulz, Kristine; Parham, Kourosh; Vambutas, Andrea; Witsell, David; Lee, Walter T; Shin, Jennifer J; Pynnonen, Melissa A; Nguyen-Huynh, Anh; Ryan, Sheila E; Langman, Alan

2016-07-01

(1) Integrate practice-based patient encounters using the Dartmouth Atlas Medicare database to understand practice treatments for Ménière's disease (MD). (2) Describe differences in the practice patterns between academic and community providers for MD. Practice-based research database review. CHEER (Creating Healthcare Excellence through Education and Research) network academic and community providers. MD patient data were identified with ICD-9 and CPT codes. Demographics, unique visits, and procedures per patient were tabulated. The Dartmouth Atlas of Health Care was used to reference regional health care utilization. Statistical analysis included 1-way analyses of variance, bivariate linear regression, and Student's t tests, with significance set at P < .05. A total of 2071 unique patients with MD were identified from 8 academic and 10 community otolaryngology-head and neck surgery provider centers nationally. Average age was 56.5 years; 63.9% were female; and 91.4% self-reported white ethnicity. There was an average of 3.2 visits per patient. Western providers had the highest average visits per patient. Midwest providers had the highest average procedures per patient. Community providers had more visits per site and per patient than did academic providers. Academic providers had significantly more operative procedures per site (P = .0002) when compared with community providers. Health care service areas with higher total Medicare reimbursements per enrollee did not report significantly more operative procedures being performed. This is the first practice-based clinical research database study to describe MD practice patterns. We demonstrate that academic otolaryngology-head and neck surgery providers perform significantly more operative procedures than do community providers for MD, and we validate these data with an independent Medicare spending database. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.
The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Cranshaw, J.; Favareto, A.; Fernández Casaní, Á.; Gallas, E. J.; Glasman, C.; González de la Hoz, S.; Hřivnáč, J.; Malon, D.; Prokoshin, F.; Salt Cairols, J.; Sánchez, J.; Többicke, R.; Yuan, R.

2015-12-01

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LSI. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported.
Tissue Proteome Analysis of Different Grades of Human Gliomas Provides Major Cues for Glioma Pathogenesis.

PubMed

Gollapalli, Kishore; Ghantasala, Saicharan; Atak, Apurva; Rapole, Srikanth; Moiyadi, Aliasgar; Epari, Sridhar; Srivastava, Sanjeeva

2017-05-01

Gliomas are heterogeneous and most commonly occurring brain tumors. Blood-brain barrier restricts the entry of brain tumor proteins into blood stream thus limiting the usage of serum or plasma for proteomic analysis. Our study aimed at understanding the molecular basis of aggressiveness of various grades of brain tumors using isobaric tagging for relative and absolute quantification (iTRAQ) based mass spectrometry. Tissue proteomic analysis of various grades of gliomas was performed using four-plex iTRAQ. We labeled five sets (each set consists of control, grade-II, III, and IV tumor samples) of individual glioma patients using iTRAQ reagents. Significantly altered proteins were subjected to bioinformatics analysis using Database for Annotation, Visualization and Integrated Discovery (DAVID). Various metabolic pathways like glycolysis, TCA-cycle, electron transport chain, lactate metabolism, and blood coagulation pathways were majorly observed to be perturbed in gliomas. Most of the identified proteins involved in redox reactions, protein folding, pre-messenger RNA (mRNA) processing, antiapoptosis, and blood coagulation were found to be upregulated in gliomas. Transcriptomics data of glioblastoma multiforme (GBM), low-grade gliomas (LGGs), and controls were downloaded from The Cancer Genome Atlas (TCGA) data portal and further analyzed using BRB-Array tools. Expression levels of a few significantly altered proteins like lactate dehydrogenase, alpha-1 antitrypsin, fibrinogen alpha chain, nucleophosmin, annexin A5, thioredoxin, ferritin light chain, thymosin beta-4-like protein 3, superoxide dismutase-2, and peroxiredoxin-1 and 6 showed a positive correlation with increasing grade of gliomas thereby offering an insight into molecular basis behind their aggressive nature. Several proteins identified in different grades of gliomas are potential grade-specific markers, and perturbed pathways provide comprehensive overview of molecular cues involved in glioma pathogenesis.
Halobacterium salinarum NRC-1 PeptideAtlas: strategies for targeted proteomics

PubMed Central

Van, Phu T.; Schmid, Amy K.; King, Nichole L.; Kaur, Amardeep; Pan, Min; Whitehead, Kenia; Koide, Tie; Facciotti, Marc T.; Goo, Young-Ah; Deutsch, Eric W.; Reiss, David J.; Mallick, Parag; Baliga, Nitin S.

2009-01-01

The relatively small numbers of proteins and fewer possible posttranslational modifications in microbes provides a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a Peptide Atlas (PA) for 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636,000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has helped highlight plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics. PMID:18652504
Data integration to prioritize drugs using genomics and curated data.

PubMed

Louhimo, Riku; Laakso, Marko; Belitskin, Denis; Klefström, Juha; Lehtonen, Rainer; Hautaniemi, Sampsa

2016-01-01

Genomic alterations affecting drug target proteins occur in several tumor types and are prime candidates for patient-specific tailored treatments. Increasingly, patients likely to benefit from targeted cancer therapy are selected based on molecular alterations. The selection of a precision therapy benefiting most patients is challenging but can be enhanced with integration of multiple types of molecular data. Data integration approaches for drug prioritization have successfully integrated diverse molecular data but do not take full advantage of existing data and literature. We have built a knowledge-base which connects data from public databases with molecular results from over 2200 tumors, signaling pathways and drug-target databases. Moreover, we have developed a data mining algorithm to effectively utilize this heterogeneous knowledge-base. Our algorithm is designed to facilitate retargeting of existing drugs by stratifying samples and prioritizing drug targets. We analyzed 797 primary tumors from The Cancer Genome Atlas breast and ovarian cancer cohorts using our framework. FGFR, CDK and HER2 inhibitors were prioritized in breast and ovarian data sets. Estrogen receptor positive breast tumors showed potential sensitivity to targeted inhibitors of FGFR due to activation of FGFR3. Our results suggest that computational sample stratification selects potentially sensitive samples for targeted therapies and can aid in precision medicine drug repositioning. Source code is available from http://csblcanges.fimm.fi/GOPredict/.
47 CFR 27.1231 - Initiating the transition.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Basic Trading Area (BTA). BTAs are based on the Rand McNally 1992 Commercial Atlas & Marketing Guide...; and (C) Specify, if known, the adjacent channel D/U ratio that can be tolerated by any receiver(s) at... database; (F) The bandwidth of each channel or subchannel, the emission type for each channel or subchannel...
Sprawl in European urban areas

NASA Astrophysics Data System (ADS)

Prastacos, Poulicos; Lagarias, Apostolos

2016-08-01

In this paper the 2006 edition of the Urban Atlas database is used to tabulate areas of low development density, usually referred to as "sprawl", for many European cities. The Urban Atlas database contains information on the land use distribution in the 305 largest European cities. Twenty different land use types are recognized, with six of them representing urban fabric. Urban fabric classes are residential areas differentiated by the density of development, which is measured by the sealing degree parameter that ranges from 0% to 100% (non-developed, fully developed). Analysis is performed on the distribution of the middle to low density areas defined as those with sealing degree less than 50%. Seven different country groups in which urban areas have similar sprawl characteristics are identified and some key characteristics of sprawl are discussed. Population of an urban area is another parameter considered in the analysis. Two spatial metrics, average patch size and mean distance to the nearest neighboring patch of the same class, are used to describe proximity/separation characteristics of sprawl in the urban areas of the seven groups.
The Coral Triangle Atlas: An Integrated Online Spatial Database System for Improving Coral Reef Management

PubMed Central

Cros, Annick; Ahamad Fatan, Nurulhuda; White, Alan; Teoh, Shwu Jiau; Tan, Stanley; Handayani, Christian; Huang, Charles; Peterson, Nate; Venegas Li, Ruben; Siry, Hendra Yusran; Fitriana, Ria; Gove, Jamison; Acoba, Tomoko; Knight, Maurice; Acosta, Renerio; Andrew, Neil; Beare, Doug

2014-01-01

In this paper we describe the construction of an online GIS database system, hosted by WorldFish, which stores bio-physical, ecological and socio-economic data for the ‘Coral Triangle Area’ in South-east Asia and the Pacific. The database has been built in partnership with all six (Timor-Leste, Malaysia, Indonesia, The Philippines, Solomon Islands and Papua New Guinea) of the Coral Triangle countries, and represents a valuable source of information for natural resource managers at the regional scale. Its utility is demonstrated using biophysical data, data summarising marine habitats, and data describing the extent of marine protected areas in the region. PMID:24941442
The NeuARt II system: a viewing tool for neuroanatomical data based on published neuroanatomical atlases

PubMed Central

Burns, Gully APC; Cheng, Wei-Cheng; Thompson, Richard H; Swanson, Larry W

2006-01-01

Background Anatomical studies of neural circuitry describing the basic wiring diagram of the brain produce intrinsically spatial, highly complex data of great value to the neuroscience community. Published neuroanatomical atlases provide a spatial framework for these studies. We have built an informatics framework based on these atlases for the representation of neuroanatomical knowledge. This framework not only captures current methods of anatomical data acquisition and analysis, it allows these studies to be collated, compared and synthesized within a single system. Results We have developed an atlas-viewing application ('NeuARt II') in the Java language with unique functional properties. These include the ability to use copyrighted atlases as templates within which users may view, save and retrieve data-maps and annotate them with volumetric delineations. NeuARt II also permits users to view multiple levels on multiple atlases at once. Each data-map in this system is simply a stack of vector images with one image per atlas level, so any set of accurate drawings made onto a supported atlas (in vector graphics format) could be uploaded into NeuARt II. Presently the database is populated with a corpus of high-quality neuroanatomical data from the laboratory of Dr Larry Swanson (consisting 64 highly-detailed maps of PHAL tract-tracing experiments, made up of 1039 separate drawings that were published in 27 primary research publications over 17 years). Herein we take selective examples from these data to demonstrate the features of NeuArt II. Our informatics tool permits users to browse, query and compare these maps. The NeuARt II tool operates within a bioinformatics knowledge management platform (called 'NeuroScholar') either as a standalone or a plug-in application. Conclusion Anatomical localization is fundamental to neuroscientific work and atlases provide an easily-understood framework that is widely used by neuroanatomists and non-neuroanatomists alike. NeuARt II, the neuroinformatics tool presented here, provides an accurate and powerful way of representing neuroanatomical data in the context of commonly-used brain atlases for visualization, comparison and analysis. Furthermore, it provides a framework that supports the delivery and manipulation of mapped data either as a standalone system or as a component in a larger knowledge management system. PMID:17166289
An Anatomically Resolved Mouse Brain Proteome Reveals Parkinson Disease-relevant Pathways *

PubMed Central

Choi, Jong Min; Rousseaux, Maxime W. C.; Malovannaya, Anna; Kim, Jean J.; Kutzera, Joachim; Wang, Yi; Huang, Yin; Zhu, Weimin; Maity, Suman; Zoghbi, Huda Yahya; Qin, Jun

2017-01-01

Here, we present a mouse brain protein atlas that covers 17 surgically distinct neuroanatomical regions of the adult mouse brain, each less than 1 mm3 in size. The protein expression levels are determined for 6,500 to 7,500 gene protein products from each region and over 12,000 gene protein products for the entire brain, documenting the physiological repertoire of mouse brain proteins in an anatomically resolved and comprehensive manner. We explored the utility of our spatially defined protein profiling methods in a mouse model of Parkinson's disease. We compared the proteome from a vulnerable region (substantia nigra pars compacta) of wild type and parkinsonian mice with that of an adjacent, less vulnerable, region (ventral tegmental area) and identified several proteins that exhibited both spatiotemporal- and genotype-restricted changes. We validated the most robustly altered proteins using an alternative profiling method and found that these modifications may highlight potential new pathways for future studies. This proteomic atlas is a valuable resource that offers a practical framework for investigating the molecular intricacies of normal brain function as well as regional vulnerability in neurological diseases. All of the mouse regional proteome profiling data are published on line at http://mbpa.bprc.ac.cn/. PMID:28153913
The Ocean Gene Atlas: exploring the biogeography of plankton genes online.

PubMed

Villar, Emilie; Vannier, Thomas; Vernette, Caroline; Lescot, Magali; Cuenca, Miguelangel; Alexandre, Aurélien; Bachelerie, Paul; Rosnet, Thomas; Pelletier, Eric; Sunagawa, Shinichi; Hingamp, Pascal

2018-05-21

The Ocean Gene Atlas is a web service to explore the biogeography of genes from marine planktonic organisms. It allows users to query protein or nucleotide sequences against global ocean reference gene catalogs. With just one click, the abundance and location of target sequences are visualized on world maps as well as their taxonomic distribution. Interactive results panels allow for adjusting cutoffs for alignment quality and displaying the abundances of genes in the context of environmental features (temperature, nutrients, etc.) measured at the time of sampling. The ease of use enables non-bioinformaticians to explore quantitative and contextualized information on genes of interest in the global ocean ecosystem. Currently the Ocean Gene Atlas is deployed with (i) the Ocean Microbial Reference Gene Catalog (OM-RGC) comprising 40 million non-redundant mostly prokaryotic gene sequences associated with both Tara Oceans and Global Ocean Sampling (GOS) gene abundances and (ii) the Marine Atlas of Tara Ocean Unigenes (MATOU) composed of >116 million eukaryote unigenes. Additional datasets will be added upon availability of further marine environmental datasets that provide the required complement of sequence assemblies, raw reads and contextual environmental parameters. Ocean Gene Atlas is a freely-available web service at: http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.
A new experiment-independent mechanism to persistify and serve the detector geometry of ATLAS

NASA Astrophysics Data System (ADS)

Bianchi, Riccardo Maria; Boudreau, Joseph; Vukotic, Ilija

2017-10-01

The complex geometry of the whole detector of the ATLAS experiment at LHC is currently stored only in custom online databases, from which it is built on-the-fly on request. Accessing the online geometry guarantees accessing the latest version of the detector description, but requires the setup of the full ATLAS software framework “Athena”, which provides the online services and the tools to retrieve the data from the database. This operation is cumbersome and slows down the applications that need to access the geometry. Moreover, all applications that need to access the detector geometry need to be built and run on the same platform as the ATLAS framework, preventing the usage of the actual detector geometry in stand-alone applications. Here we propose a new mechanism to persistify (in software development in general, and in HEP computing in particular, persistifying means taking an object which lives in memory only - for example because it was built on-the-fly while processing the experimental data, - serializing it and storing it on disk as a persistent object) and serve the geometry of HEP experiments. The new mechanism is composed by a new file format and the modules to make use of it. The new file format allows to store the whole detector description locally in a file, and it is especially optimized to describe large complex detectors with the minimum file size, making use of shared instances and storing compressed representations of geometry transformations. Then, the detector description can be read back in, to fully restore the in-memory geometry tree. Moreover, a dedicated REST API is being designed and developed to serve the geometry in standard exchange formats like JSON, to let users and applications download specific partial geometry information. With this new geometry persistification a new generation of applications could be developed, which can use the actual detector geometry while being platform-independent and experiment-independent.
Abasy Atlas: a comprehensive inventory of systems, global network properties and systems-level elements across bacteria

PubMed Central

Ibarra-Arellano, Miguel A.; Campos-González, Adrián I.; Treviño-Quintanilla, Luis G.; Tauch, Andreas; Freyre-González, Julio A.

2016-01-01

The availability of databases electronically encoding curated regulatory networks and of high-throughput technologies and methods to discover regulatory interactions provides an invaluable source of data to understand the principles underpinning the organization and evolution of these networks responsible for cellular regulation. Nevertheless, data on these sources never goes beyond the regulon level despite the fact that regulatory networks are complex hierarchical-modular structures still challenging our understanding. This brings the necessity for an inventory of systems across a large range of organisms, a key step to rendering feasible comparative systems biology approaches. In this work, we take the first step towards a global understanding of the regulatory networks organization by making a cartography of the functional architectures of diverse bacteria. Abasy (Across-bacteria systems) Atlas provides a comprehensive inventory of annotated functional systems, global network properties and systems-level elements (global regulators, modular genes shaping functional systems, basal machinery genes and intermodular genes) predicted by the natural decomposition approach for reconstructed and meta-curated regulatory networks across a large range of bacteria, including pathogenically and biotechnologically relevant organisms. The meta-curation of regulatory datasets provides the most complete and reliable set of regulatory interactions currently available, which can even be projected into subsets by considering the force or weight of evidence supporting them or the systems that they belong to. Besides, Abasy Atlas provides data enabling large-scale comparative systems biology studies aimed at understanding the common principles and particular lifestyle adaptions of systems across bacteria. Abasy Atlas contains systems and system-level elements for 50 regulatory networks comprising 78 649 regulatory interactions covering 42 bacteria in nine taxa, containing 3708 regulons and 1776 systems. All this brings together a large corpus of data that will surely inspire studies to generate hypothesis regarding the principles governing the evolution and organization of systems and the functional architectures controlling them. Database URL: http://abasy.ccg.unam.mx PMID:27242034

Compilation of the ``Atlas of Gamma-rays from the Inelastic Scattering of Reactor Fast Neutrons'' (1978DE41) by A.M. Demidov, L.I. Govor, Yu. K. Cherepantsev, M.R. Ahmed, S. Al-Najjar, M.A. Al-Amili, N. Al-Assafi, and N. Rammo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hurst, Aaron M.; Bernstein, Lee A.; Chong, Su-Ann

A Structured Query Language (SQL) relational database has been developed based on the original (n,n'gamma) work carried out by A.M. Demidov et al., at the Nuclear Research Institute in Baghdad, Iraq [``Atlas of Gamma-Ray Spectra from the Inelastic Scattering of Reactor Fast Neutrons'', Nuclear Research Institute, Baghdad, Iraq (Moscow, Atomizdat 1978)] for 105 independent measurements comprising 76 elemental samples of natural composition and 29 isotopically-enriched samples. The information from this ATLAS includes: gamma-ray energies and intensities; nuclide and level data corresponding to where the gamma-ray originated from; target (sample) experimental-measurement data. Taken together, this information allows for the extraction ofmore » the flux-weighted (n,n'gamma) cross sections for a given transition relative to a defined value. Currently, we are using the fast-neutron flux-weighted partial gamma-ray cross section from ENDF/B-VII.1 for the production of the 847-keV transition from the first excited 2+ state to the 0+ ground state in 56Fe, 468 mb. This value also takes into account contributions to the 847-keV transition following beta(-) decay of 56Mn formed in the 56Fe(n,p) reaction. However, this value can easily be adjusted to accommodate the user preference. The (n,n'gamma) data has been compiled into a series of ASCII comma separated value tables and a suite of Python scripts and C modules are provided to build the database. Upon building, the database can then be interacted with directly via the SQLite engine or accessed via the Jupyter Notebook Python-browser interface. Several examples exploiting these utilities are also provided with the complete software package.« less
Reconstructing the Chernobyl Nuclear Power Plant (CNPP) accident 30 years after. A unique database of air concentration and deposition measurements over Europe.

PubMed

Evangeliou, Nikolaos; Hamburger, Thomas; Talerko, Nikolai; Zibtsev, Sergey; Bondar, Yuri; Stohl, Andreas; Balkanski, Yves; Mousseau, Timothy A; Møller, Anders P

2016-09-01

30 years after the Chernobyl Nuclear Power Plant (CNPP) accident, its radioactive releases still remain of great interest mainly due to the long half-lives of many radionuclides emitted. Observations from the terrestrial environment, which hosts radionuclides for many years after initial deposition, are important for health and environmental assessments. Furthermore, such measurements are the basis for validation of atmospheric transport models and can be used for constraining the still not accurately known source terms. However, although the "Atlas of cesium deposition on Europe after the Chernobyl accident" (hereafter referred to as "Atlas") has been published since 1998, less than 1% of the direct observations of (137)Cs deposition has been made publicly available. The remaining ones are neither accessible nor traceable to specific data providers and a large fraction of these data might have been lost entirely. The present paper is an effort to rescue some of the data collected over the years following the CNPP accident and make them publicly available. The database includes surface air activity concentrations and deposition observations for (131)I, (134)Cs and (137)Cs measured and provided by Former Soviet Union authorities the years that followed the accident. Using the same interpolation tool as the official authorities, we have reconstructed a deposition map of (137)Cs based on about 3% of the data used to create the Atlas map. The reconstructed deposition map is very similar to the official one, but it has the advantage that it is based exclusively on documented data sources, which are all made available within this publication. In contrast to the official map, our deposition map is therefore reproducible and all underlying data can be used also for other purposes. The efficacy of the database was proved using simulated activity concentrations and deposition of (137)Cs from a Langrangian and a Euleurian transport model. Copyright © 2016. Published by Elsevier Ltd.
Human Mitochondrial Protein Database

National Institute of Standards and Technology Data Gateway

SRD 131 Human Mitochondrial Protein Database (Web, free access) The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.
Event selection services in ATLAS

NASA Astrophysics Data System (ADS)

Cranshaw, J.; Cuhadar-Donszelmann, T.; Gallas, E.; Hrivnac, J.; Kenyon, M.; McGlone, H.; Malon, D.; Mambelli, M.; Nowak, M.; Viegas, F.; Vinek, E.; Zhang, Q.

2010-04-01

ATLAS has developed and deployed event-level selection services based upon event metadata records ("TAGS") and supporting file and database technology. These services allow physicists to extract events that satisfy their selection predicates from any stage of data processing and use them as input to later analyses. One component of these services is a web-based Event-Level Selection Service Interface (ELSSI). ELSSI supports event selection by integrating run-level metadata, luminosity-block-level metadata (e.g., detector status and quality information), and event-by-event information (e.g., triggers passed and physics content). The list of events that survive after some selection criterion is returned in a form that can be used directly as input to local or distributed analysis; indeed, it is possible to submit a skimming job directly from the ELSSI interface using grid proxy credential delegation. ELSSI allows physicists to explore ATLAS event metadata as a means to understand, qualitatively and quantitatively, the distributional characteristics of ATLAS data. In fact, the ELSSI service provides an easy interface to see the highest missing ET events or the events with the most leptons, to count how many events passed a given set of triggers, or to find events that failed a given trigger but nonetheless look relevant to an analysis based upon the results of offline reconstruction, and more. This work provides an overview of ATLAS event-level selection services, with an emphasis upon the interactive Event-Level Selection Service Interface.
The Protein Information Resource: an integrated public resource of functional annotation of proteins

PubMed Central

Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

2002-01-01

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
A Brief Review of RNA–Protein Interaction Database Resources

PubMed Central

Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong

2017-01-01

RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278
The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations.

PubMed

Lee, HoJoon; Palm, Jennifer; Grimes, Susan M; Ji, Hanlee P

2015-10-27

The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer. The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history. The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.
Translational analysis of mouse and human placental protein and mRNA reveals distinct molecular pathologies in human preeclampsia.

PubMed

Cox, Brian; Sharma, Parveen; Evangelou, Andreas I; Whiteley, Kathie; Ignatchenko, Vladimir; Ignatchenko, Alex; Baczyk, Dora; Czikk, Marie; Kingdom, John; Rossant, Janet; Gramolini, Anthony O; Adamson, S Lee; Kislinger, Thomas

2011-12-01

Preeclampsia (PE) adversely impacts ~5% of pregnancies. Despite extensive research, no consistent biomarkers or cures have emerged, suggesting that different molecular mechanisms may cause clinically similar disease. To address this, we undertook a proteomics study with three main goals: (1) to identify a panel of cell surface markers that distinguish the trophoblast and endothelial cells of the placenta in the mouse; (2) to translate this marker set to human via the Human Protein Atlas database; and (3) to utilize the validated human trophoblast markers to identify subgroups of human preeclampsia. To achieve these goals, plasma membrane proteins at the blood tissue interfaces were extracted from placentas using intravascular silica-bead perfusion, and then identified using shotgun proteomics. We identified 1181 plasma membrane proteins, of which 171 were enriched at the maternal blood-trophoblast interface and 192 at the fetal endothelial interface with a 70% conservation of expression in humans. Three distinct molecular subgroups of human preeclampsia were identified in existing human microarray data by using expression patterns of trophoblast-enriched proteins. Analysis of all misexpressed genes revealed divergent dysfunctions including angiogenesis (subgroup 1), MAPK signaling (subgroup 2), and hormone biosynthesis and metabolism (subgroup 3). Subgroup 2 lacked expected changes in known preeclampsia markers (sFLT1, sENG) and uniquely overexpressed GNA12. In an independent set of 40 banked placental specimens, GNA12 was overexpressed during preeclampsia when co-incident with chronic hypertension. In the current study we used a novel translational analysis to integrate mouse and human trophoblast protein expression with human microarray data. This strategy identified distinct molecular pathologies in human preeclampsia. We conclude that clinically similar preeclampsia patients exhibit divergent placental gene expression profiles thus implicating divergent molecular mechanisms in the origins of this disease.
Test of ATLAS RPCs Front-End electronics

NASA Astrophysics Data System (ADS)

Aielli, G.; Camarri, P.; Cardarelli, R.; Di Ciaccio, A.; Di Stante, L.; Liberti, B.; Paoloni, A.; Pastori, E.; Santonico, R.

2003-08-01

The Front-End Electronics performing the ATLAS RPCs readout is a full custom 8 channels GaAs circuit, which integrates in a single die both the analog and digital signal processing. The die is bonded on the Front-End board which is completely closed inside the detector Faraday cage. About 50 000 FE boards are foreseen for the experiment. The complete functionality of the FE boards will be certificated before the detector assembly. We describe here the systematic test devoted to check the dynamic functionality of each single channel and the selection criteria applied. It measures and registers all relevant electronics parameters to build up a complete database for the experiment. The statistical results from more than 1100 channels are presented.
Protein Information Resource: a community resource for expert annotation of protein data

PubMed Central

Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

2001-01-01

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

PubMed

Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

2012-02-27

ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.
Halobacterium salinarum NRC-1 PeptideAtlas: toward strategies for targeted proteomics and improved proteome coverage.

PubMed

Van, Phu T; Schmid, Amy K; King, Nichole L; Kaur, Amardeep; Pan, Min; Whitehead, Kenia; Koide, Tie; Facciotti, Marc T; Goo, Young Ah; Deutsch, Eric W; Reiss, David J; Mallick, Parag; Baliga, Nitin S

2008-09-01

The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.
High-Performance Secure Database Access Technologies for HEP Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matthew Vranicar; John Weicher

2006-04-17

The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less
Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data.

PubMed

Kumar, Dhirendra; Yadav, Amit Kumar; Dash, Debasis

2017-01-01

Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.
The importance of having an appropriate relational data segmentation in ATLAS

NASA Astrophysics Data System (ADS)

Dimitrov, G.

2015-12-01

In this paper we describe specific technical solutions put in place in various database applications of the ATLAS experiment at LHC where we make use of several partitioning techniques available in Oracle 11g. With the broadly used range partitioning and its option of automatic interval partitioning we add our own logic in PLSQL procedures and scheduler jobs to sustain data sliding windows in order to enforce various data retention policies. We also make use of the new Oracle 11g reference partitioning in the Nightly Build System to achieve uniform data segmentation. However the most challenging issue was to segment the data of the new ATLAS Distributed Data Management system (Rucio), which resulted in tens of thousands list type partitions and sub-partitions. Partition and sub-partition management, index strategy, statistics gathering and queries execution plan stability are important factors when choosing an appropriate physical model for the application data management. The so-far accumulated knowledge and analysis on the new Oracle 12c version features that could be beneficial will be shared with the audience.
Evaluation of Cross-Protocol Stability of a Fully Automated Brain Multi-Atlas Parcellation Tool.

PubMed

Liang, Zifei; He, Xiaohai; Ceritoglu, Can; Tang, Xiaoying; Li, Yue; Kutten, Kwame S; Oishi, Kenichi; Miller, Michael I; Mori, Susumu; Faria, Andreia V

2015-01-01

Brain parcellation tools based on multiple-atlas algorithms have recently emerged as a promising method with which to accurately define brain structures. When dealing with data from various sources, it is crucial that these tools are robust for many different imaging protocols. In this study, we tested the robustness of a multiple-atlas, likelihood fusion algorithm using Alzheimer's Disease Neuroimaging Initiative (ADNI) data with six different protocols, comprising three manufacturers and two magnetic field strengths. The entire brain was parceled into five different levels of granularity. In each level, which defines a set of brain structures, ranging from eight to 286 regions, we evaluated the variability of brain volumes related to the protocol, age, and diagnosis (healthy or Alzheimer's disease). Our results indicated that, with proper pre-processing steps, the impact of different protocols is minor compared to biological effects, such as age and pathology. A precise knowledge of the sources of data variation enables sufficient statistical power and ensures the reliability of an anatomical analysis when using this automated brain parcellation tool on datasets from various imaging protocols, such as clinical databases.
ATLAS Simulation using Real Data: Embedding and Overlay

NASA Astrophysics Data System (ADS)

Haas, Andrew; ATLAS Collaboration

2017-10-01

For some physics processes studied with the ATLAS detector, a more accurate simulation in some respects can be achieved by including real data into simulated events, with substantial potential improvements in the CPU, disk space, and memory usage of the standard simulation configuration, at the cost of significant database and networking challenges. Real proton-proton background events can be overlaid (at the detector digitization output stage) on a simulated hard-scatter process, to account for pileup background (from nearby bunch crossings), cavern background, and detector noise. A similar method is used to account for the large underlying event from heavy ion collisions, rather than directly simulating the full collision. Embedding replaces the muons found in Z→μμ decays in data with simulated taus at the same 4-momenta, thus preserving the underlying event and pileup from the original data event. In all these cases, care must be taken to exactly match detector conditions (beamspot, magnetic fields, alignments, dead sensors, etc.) between the real data event and the simulation. We will discuss the status of these overlay and embedding techniques within ATLAS software and computing.
Statistical atlas based extrapolation of CT data

NASA Astrophysics Data System (ADS)

Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran

2010-02-01

We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
MIPS: analysis and annotation of proteins from whole genomes.

PubMed

Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
A New Stellar Atmosphere Grid and Comparisons with HST /STIS CALSPEC Flux Distributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bohlin, Ralph C.; Fleming, Scott W.; Gordon, Karl D.

The Space Telescope Imaging Spectrograph has measured the spectral energy distributions for several stars of types O, B, A, F, and G. These absolute fluxes from the CALSPEC database are fit with a new spectral grid computed from the ATLAS-APOGEE ATLAS9 model atmosphere database using a chi-square minimization technique in four parameters. The quality of the fits are compared for complete LTE grids by Castelli and Kurucz (CK04) and our new comprehensive LTE grid (BOSZ). For the cooler stars, the fits with the MARCS LTE grid are also evaluated, while the hottest stars are also fit with the NLTE Lanzmore » and Hubeny OB star grids. Unfortunately, these NLTE models do not transition smoothly in the infrared to agree with our new BOSZ LTE grid at the NLTE lower limit of T {sub eff} = 15,000 K. The new BOSZ grid is available via the Space Telescope Institute MAST archive and has a much finer sampled IR wavelength scale than CK04, which will facilitate the modeling of stars observed by the James Webb Space Telescope . Our result for the angular diameter of Sirius agrees with the ground-based interferometric value.« less

A New Stellar Atmosphere Grid and Comparisons with HST/STIS CALSPEC Flux Distributions

NASA Astrophysics Data System (ADS)

Bohlin, Ralph C.; Mészáros, Szabolcs; Fleming, Scott W.; Gordon, Karl D.; Koekemoer, Anton M.; Kovács, József

2017-05-01

The Space Telescope Imaging Spectrograph has measured the spectral energy distributions for several stars of types O, B, A, F, and G. These absolute fluxes from the CALSPEC database are fit with a new spectral grid computed from the ATLAS-APOGEE ATLAS9 model atmosphere database using a chi-square minimization technique in four parameters. The quality of the fits are compared for complete LTE grids by Castelli & Kurucz (CK04) and our new comprehensive LTE grid (BOSZ). For the cooler stars, the fits with the MARCS LTE grid are also evaluated, while the hottest stars are also fit with the NLTE Lanz & Hubeny OB star grids. Unfortunately, these NLTE models do not transition smoothly in the infrared to agree with our new BOSZ LTE grid at the NLTE lower limit of T eff = 15,000 K. The new BOSZ grid is available via the Space Telescope Institute MAST archive and has a much finer sampled IR wavelength scale than CK04, which will facilitate the modeling of stars observed by the James Webb Space Telescope. Our result for the angular diameter of Sirius agrees with the ground-based interferometric value.
DNA methylation biomarkers for head and neck squamous cell carcinoma.

PubMed

Zhou, Chongchang; Ye, Meng; Ni, Shumin; Li, Qun; Ye, Dong; Li, Jinyun; Shen, Zhishen; Deng, Hongxia

2018-06-21

DNA methylation plays an important role in the etiology and pathogenesis of head and neck squamous cell carcinoma (HNSCC). The current study aimed to identify aberrantly methylated-differentially expressed genes (DEGs) by a comprehensive bioinformatics analysis. In addition, we screened for DEGs affected by DNA methylation modification and further investigated their prognostic values for HNSCC. We included microarray data of DNA methylation (GSE25093 and GSE33202) and gene expression (GSE23036 and GSE58911) from Gene Expression Omnibus. Aberrantly methylated-DEGs were analyzed with R software. The Cancer Genome Atlas (TCGA) RNA sequencing and DNA methylation (Illumina HumanMethylation450) databases were utilized for validation. In total, 27 aberrantly methylated genes accompanied by altered expression were identified. After confirmation by The Cancer Genome Atlas (TCGA) database, 2 hypermethylated-low-expression genes (FAM135B and ZNF610) and 2 hypomethylated-high-expression genes (HOXA9 and DCC) were identified. A receiver operating characteristic (ROC) curve confirmed the diagnostic value of these four methylated genes for HNSCC. Multivariate Cox proportional hazards analysis showed that FAM135B methylation was a favorable independent prognostic biomarker for overall survival of HNSCC patients.
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas. | Office of Cancer Genomics

Cancer.gov

Although the MYC oncogene has been implicated in cancer, a systematic assessment of alterations of MYC, related transcription factors, and co-regulatory proteins, forming the proximal MYC network (PMN), across human cancers is lacking. Using computational approaches, we define genomic and proteomic features associated with MYC and the PMN across the 33 cancers of The Cancer Genome Atlas. Pan-cancer, 28% of all samples had at least one of the MYC paralogs amplified.
The Cancer Genome Atlas Pan-Cancer analysis project.

PubMed

Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

2013-10-01

The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
Ionization ratios and elemental abundances in the atmosphere of 68 Tauri

NASA Astrophysics Data System (ADS)

Aouina, A.; Monier, R.

2017-12-01

We have derived the ionization ratios of twelve elements in the atmosphere of the star 68 Tauri (HD 27962) using an ATLAS9 model atmosphere with 72 layers computed for the effective temperature and surface gravity of the star. We then computed a grid of synthetic spectra generated by SYNSPEC49 based on an ATLAS9 model atmosphere in order to model one high resolution spectrum secured by one of us (RM) with the échelle spectrograph SOPHIE at Observatoire de Haute Provence. We could determine the abundances of several elements in their dominant ionization stage, including those defining the Am phenomenon. We thus provide new abundance determinations for 68 Tauri using updated accurate atomic data retrieved from the NIST database which extend previous abundance works.
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
A geochemical atlas of North Carolina, USA

USGS Publications Warehouse

Reid, J.C.

1993-01-01

A geochemical atlas of North Carolina, U.S.A., was prepared using National Uranium Resource Evaluation (NURE) stream-sediment data. Before termination of the NURE program, sampling of nearly the entire state (48,666 square miles of land area) was completed and geochemical analyses were obtained. The NURE data are applicable to mineral exploration, agriculture, waste disposal siting issues, health, and environmental studies. Applications in state government include resource surveys to assist mineral exploration by identifying geochemical anomalies and areas of mineralization. Agriculture seeks to identify areas with favorable (or unfavorable) conditions for plant growth, disease, and crop productivity. Trace elements such as cobalt, copper, chromium, iron, manganese, zinc, and molybdenum must be present within narrow ranges in soils for optimum growth and productivity. Trace elements as a contributing factor to disease are of concern to health professionals. Industry can use pH and conductivity data for water samples to site facilities which require specific water quality. The North Carolina NURE database consists of stream-sediment samples, groundwater samples, and stream-water analyses. The statewide database consists of 6,744 stream-sediment sites, 5,778 groundwater sample sites, and 295 stream-water sites. Neutron activation analyses were provided for U, Br, Cl, F, Mn, Na, Al, V, Dy in groundwater and stream water, and for U, Th, Hf, Ce, Fe, Mn, Na, Sc, Ti, V, Al, Dy, Eu, La, Sm, Yb, and Lu in stream sediments. Supplemental analyses by other techniques were reported on U (extractable), Ag, As, Ba, Be, Ca, Co, Cr, Cu, K, Li, Mg, Mo, Nb, Ni, P, Pb, Se, Sn, Sr, W, Y, and Zn for 4,619 stream-sediment samples. A small subset of 334 stream samples was analyzed for gold. The goal of the atlas was to make available the statewide NURE data with minimal interpretation to enable prospective users to modify and manipulate the data for their end use. The atlas provides only very general indication of geochemical distribution patterns and should not be used for site specific studies. The atlas maps for each element were computer-generated at the state's geographic information system (Center for Geographic Information and Analysis [CGIA]). The Division of Statistics and Information Services provided input files. The maps in the atlas are point maps. Each sample is represented by a symbol generally corresponding to a quartile class. Other reports will transmit sample and analytical data for state regions. Data are tentatively planned to be available on disks in spreadsheet format for personal computers. During the second phase of this project, stream-sediment samples are being assigned to state geologic map unit names using a GIS system to determine background and anomaly values. Subsequent publications will make this geochemical data and accompanying interpretations available to a wide spectrum of interdisciplinary users. ?? 1993.
The U.S. Geological Survey mapping and cartographic database activities, 2006-2010

USGS Publications Warehouse

Craun, Kari J.; Donnelly, John P.; Allord, Gregory J.

2011-01-01

The U.S. Geological Survey (USGS) began systematic topographic mapping of the United States in the 1880s, beginning with scales of 1:250,000 and 1:125,000 in support of geological mapping. Responding to the need for higher resolution and more detail, the 1:62,500-scale, 15-minute, topographic map series was begun in the beginning of the 20th century. Finally, in the 1950s the USGS adopted the 1:24,000-scale, 7.5-minute topographic map series to portray even more detail, completing the coverage of the conterminous 48 states of the United States with this series in 1992. In 2001, the USGS developed the vision and concept of The National Map, a topographic database for the 21st century and the source for a new generation of topographic maps (http://nationalmap.gov/). In 2008, the initial production of those maps began with a 1:24,000-scale digital product. In a separate, but related project, the USGS began scanning the existing inventory of historical topographic maps at all scales to accompany the new topographic maps. The USGS also had developed a digital database of The National Atlas of the United States. The digital version of Atlas is now Web-available and supports a mapping engine for small scale maps of the United States and North America. These three efforts define topographic mapping activities of the USGS during the last few years and are discussed below.
A web-based solution to visualize operational monitoring data in the Trigger and Data Acquisition system of the ATLAS experiment at the LHC

NASA Astrophysics Data System (ADS)

Avolio, G.; D'Ascanio, M.; Lehmann-Miotto, G.; Soloviev, I.

2017-10-01

The Trigger and Data Acquisition (TDAQ) system of the ATLAS detector at the Large Hadron Collider at CERN is composed of a large number of distributed hardware and software components (about 3000 computers and more than 25000 applications) which, in a coordinated manner, provide the data-taking functionality of the overall system. During data taking runs, a huge flow of operational data is produced in order to constantly monitor the system and allow proper detection of anomalies or misbehaviours. In the ATLAS trigger and data acquisition system, operational data are archived and made available to applications by the P-BEAST (Persistent Back-End for the Atlas Information System of TDAQ) service, implementing a custom time-series database. The possibility to efficiently visualize both realtime and historical operational data is a great asset facilitating both online identification of problems and post-mortem analysis. This paper will present a web-based solution developed to achieve such a goal: the solution leverages the flexibility of the P-BEAST archiver to retrieve data, and exploits the versatility of the Grafana dashboard builder to offer a very rich user experience. Additionally, particular attention will be given to the way some technical challenges (like the efficient visualization of a huge amount of data and the integration of the P-BEAST data source in Grafana) have been faced and solved.
Increased-resolution OCT thickness mapping of the human macula: a statistically based registration.

PubMed

Bernardes, Rui; Santos, Torcato; Cunha-Vaz, José

2008-05-01

To describe the development of a technique that enhances spatial resolution of retinal thickness maps of the Stratus OCT (Carl Zeiss Meditec, Inc., Dublin, CA). A retinal thickness atlas (RT-atlas) template was calculated, and a macular coordinate system was established, to pursue this objective. The RT-atlas was developed from principal component analysis of retinal thickness analyzer (RTA) maps acquired from healthy volunteers. The Stratus OCT radial thickness measurements were registered on the RT-atlas, from which an improved macular thickness map was calculated. Thereafter, Stratus OCT circular scans were registered on the previously calculated map to enhance spatial resolution. The developed technique was applied to Stratus OCT thickness data from healthy volunteers and from patients with diabetic retinopathy (DR) or age-related macular degeneration (AMD). Results showed that for normal, or close to normal, macular thickness maps from healthy volunteers and patients with DR, this technique can be an important aid in determining retinal thickness. Efforts are under way to improve the registration of retinal thickness data in patients with AMD. The developed technique enhances the evaluation of data acquired by the Stratus OCT, helping the detection of early retinal thickness abnormalities. Moreover, a normative database of retinal thickness measurements gained from this technique, as referenced to the macular coordinate system, can be created without errors induced by missed fixation and eye tilt.
The Halophile protein database.

PubMed

Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

2014-01-01

Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.
FACETS: multi-faceted functional decomposition of protein interaction networks

PubMed Central

Seah, Boon-Siew; Bhowmick, Sourav S.; Forbes Dewey, C.

2012-01-01

Motivation: The availability of large-scale curated protein interaction datasets has given rise to the opportunity to investigate higher level organization and modularity within the protein–protein interaction (PPI) network using graph theoretic analysis. Despite the recent progress, systems level analysis of high-throughput PPIs remains a daunting task because of the amount of data they present. In this article, we propose a novel PPI network decomposition algorithm called FACETS in order to make sense of the deluge of interaction data using Gene Ontology (GO) annotations. FACETS finds not just a single functional decomposition of the PPI network, but a multi-faceted atlas of functional decompositions that portray alternative perspectives of the functional landscape of the underlying PPI network. Each facet in the atlas represents a distinct interpretation of how the network can be functionally decomposed and organized. Our algorithm maximizes interpretative value of the atlas by optimizing inter-facet orthogonality and intra-facet cluster modularity. Results: We tested our algorithm on the global networks from IntAct, and compared it with gold standard datasets from MIPS and KEGG. We demonstrated the performance of FACETS. We also performed a case study that illustrates the utility of our approach. Contact: seah0097@ntu.edu.sg or assourav@ntu.edu.sg Supplementary information: Supplementary data are available at the Bioinformatics online. Availability: Our software is available freely for non-commercial purposes from: http://www.cais.ntu.edu.sg/∼assourav/Facets/ PMID:22908217
Role of miR-452-5p in the tumorigenesis of prostate cancer: A study based on the Cancer Genome Atl(TCGA), Gene Expression Omnibus (GEO), and bioinformatics analysis.

PubMed

Gao, Li; Zhang, Li-Jie; Li, Sheng-Hua; Wei, Li-Li; Luo, Bin; He, Rong-Quan; Xia, Shuang

2018-03-06

MiR-452-5p has been reported to be down-regulated in prostate cancer, affecting the development of this type of cancer. However, the molecular mechanism of miR-452-5p in prostate cancer remains unclear. Therefore, we investigated the network of target genes of miR-452-5p in prostate cancer using bioinformatics analyses. We first analyzed the expression profiles and prognostic value of miR-452-5p in prostate cancer tissues from a public database. Gene Ontology (GO), the Kyoto Encyclopedia of Genes and Genomes (KEGG), PANTHER pathway analyses, and a disease ontology (DG) analysis were performed to find the molecular functions of the target genes from GSE datasets and miRWalk. Finally, we validated hub genes from the protein-protein interaction (PPI) networks of the target genes in the Human Protein Atlas (HPA) database and Gene Expression Profiling Interactive Analysis (GEPIA). Narrowing down the optimal target genes was conducted by seeking the common parts of up-regulated genes from GEPIA, down-regulated genes from GSE datasets, and predicted genes in miRWalk. Based on mining of GEO and ArrayExpress microarray chips and miRNA-Seq data in the TCGA database, which includes 1007 prostate cancer samples and 387 non-cancer samples, miR-452-5p is shown to be down-regulated in prostate cancer. GO, KEGG, and PANTHER pathway analyses suggested that the target genes might participate in important biological processes, such as transforming growth factor beta signaling and the positive regulation of brown fat cell differentiation and mesenchymal cell differentiation, as well as the Ras signaling pathway and pathways regulating the pluripotency of stem cells and arrhythmogenic right ventricular cardiomyopathy (ARVC). Nine genes-GABBR, PNISR, NTSR1, DOCK1, EREG, SFRP1, PTGS2, LEF1, and BMP2-were defined as hub genes in the PPI network. Three genes-FAM174B, SLC30A4, and SLIT1-were jointly shared by GEPIA, the GSE datasets, and miRWalk. Down-regulated miR-452-5p might play an essential role in the tumorigenesis of prostate cancer. Copyright © 2018. Published by Elsevier GmbH.
NASA MEaSUREs Combined ASTER and MODIS Emissivity over Land (CAMEL)

NASA Astrophysics Data System (ADS)

Borbas, E. E.; Hulley, G. C.; Feltz, M.; Knuteson, R. O.; Hook, S. J.

2016-12-01

A land surface emissivity product of the NASA MEASUREs project called Combined ASTER and MODIS Emissivity over Land (CAMEL) is being made available as part of the Unified and Coherent Land Surface Temperature and Emissivity (LST&E) Earth System Data Record (ESDR). The CAMEL database has been created by merging the UW MODIS-based baseline-fit emissivity database (UWIREMIS) developed at the University of Wisconsin-Madison, and the ASTER Global Emissivity Database (ASTER GED V4) produced at JPL. This poster will introduce the beta version of the database, which is available globally for the period 2003 through 2015 at 5km in mean monthly time-steps and for 13 bands from 3.6-14.3 micron. An algorithm to create a high spectral emissivity on 417 wavenumbers is also provided for high spectral IR applications. On the poster the CAMEL database has been evaluated with the IASI Emissivity Atlas (Zhou et al, 2010) and laboratory measurements, and also through simulation of IASI BTs in the RTTOV Forward model.
A Combined Omics Approach to Generate the Surface Atlas of Human Naive CD4+ T Cells during Early T-Cell Receptor Activation*

PubMed Central

Graessel, Anke; Hauck, Stefanie M.; von Toerne, Christine; Kloppmann, Edda; Goldberg, Tatyana; Koppensteiner, Herwig; Schindler, Michael; Knapp, Bettina; Krause, Linda; Dietz, Katharina; Schmidt-Weber, Carsten B.; Suttner, Kathrin

2015-01-01

Naive CD4+ T cells are the common precursors of multiple effector and memory T-cell subsets and possess a high plasticity in terms of differentiation potential. This stem-cell-like character is important for cell therapies aiming at regeneration of specific immunity. Cell surface proteins are crucial for recognition and response to signals mediated by other cells or environmental changes. Knowledge of cell surface proteins of human naive CD4+ T cells and their changes during the early phase of T-cell activation is urgently needed for a guided differentiation of naive T cells and may support the selection of pluripotent cells for cell therapy. Periodate oxidation and aniline-catalyzed oxime ligation technology was applied with subsequent quantitative liquid chromatography-tandem MS to generate a data set describing the surface proteome of primary human naive CD4+ T cells and to monitor dynamic changes during the early phase of activation. This led to the identification of 173 N-glycosylated surface proteins. To independently confirm the proteomic data set and to analyze the cell surface by an alternative technique a systematic phenotypic expression analysis of surface antigens via flow cytometry was performed. This screening expanded the previous data set, resulting in 229 surface proteins, which were expressed on naive unstimulated and activated CD4+ T cells. Furthermore, we generated a surface expression atlas based on transcriptome data, experimental annotation, and predicted subcellular localization, and correlated the proteomics result with this transcriptional data set. This extensive surface atlas provides an overall naive CD4+ T cell surface resource and will enable future studies aiming at a deeper understanding of mechanisms of T-cell biology allowing the identification of novel immune targets usable for the development of therapeutic treatments. PMID:25991687
MIPS: analysis and annotation of proteins from whole genomes

PubMed Central

Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

2004-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354
P38 delta MAPK promotes breast cancer progression and lung metastasis by enhancing cell proliferation and cell detachment.

PubMed

Wada, M; Canals, D; Adada, M; Coant, N; Salama, M F; Helke, K L; Arthur, J S; Shroyer, K R; Kitatani, K; Obeid, L M; Hannun, Y A

2017-11-23

The protein p38 mitogen-activated protein kinase (MAPK) delta isoform (p38δ) is a poorly studied member of the MAPK family. Data analysis from The Cancer Genome Atlas database revealed that p38δ is highly expressed in all types of human breast cancers. Using a human breast cancer tissue array, we confirmed elevation in cancer tissue. The breast cancer mouse model, MMTV-PyMT (PyMT), developed breast tumors with lung metastasis; however, mice deleted in p38δ (PyMT/p38δ -/- ) exhibited delayed primary tumor formation and highly reduced lung metastatic burden. At the cellular level, we demonstrate that targeting of p38δ in breast cancer cells, MCF-7 and MDA-MB-231 resulted in a reduced rate of cell proliferation. In addition, cells lacking p38δ also displayed an increased cell-matrix adhesion and reduced cell detachment. This effect on cell adhesion was molecularly supported by the regulation of the focal adhesion kinase by p38δ in the human breast cell lines. These studies define a previously unappreciated role for p38δ in breast cancer development and evolution by regulating tumor growth and altering metastatic properties. This study proposes MAPK p38δ protein as a key factor in breast cancer. Lack of p38δ resulted in reduced primary tumor size and blocked the metastatic potential to the lungs.
Estimation of the proteomic cancer co-expression sub networks by using association estimators.

PubMed

Erdoğan, Cihat; Kurt, Zeyneb; Diri, Banu

2017-01-01

In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators' performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists.
Estimation of the proteomic cancer co-expression sub networks by using association estimators

PubMed Central

Kurt, Zeyneb; Diri, Banu

2017-01-01

In this study, the association estimators, which have significant influences on the gene network inference methods and used for determining the molecular interactions, were examined within the co-expression network inference concept. By using the proteomic data from five different cancer types, the hub genes/proteins within the disease-associated gene-gene/protein-protein interaction sub networks were identified. Proteomic data from various cancer types is collected from The Cancer Proteome Atlas (TCPA). Correlation and mutual information (MI) based nine association estimators that are commonly used in the literature, were compared in this study. As the gold standard to measure the association estimators’ performance, a multi-layer data integration platform on gene-disease associations (DisGeNET) and the Molecular Signatures Database (MSigDB) was used. Fisher's exact test was used to evaluate the performance of the association estimators by comparing the created co-expression networks with the disease-associated pathways. It was observed that the MI based estimators provided more successful results than the Pearson and Spearman correlation approaches, which are used in the estimation of biological networks in the weighted correlation network analysis (WGCNA) package. In correlation-based methods, the best average success rate for five cancer types was 60%, while in MI-based methods the average success ratio was 71% for James-Stein Shrinkage (Shrink) and 64% for Schurmann-Grassberger (SG) association estimator, respectively. Moreover, the hub genes and the inferred sub networks are presented for the consideration of researchers and experimentalists. PMID:29145449
Large-scale extraction of brain connectivity from the neuroscientific literature

PubMed Central

Richardet, Renaud; Chappelier, Jean-Cédric; Telefont, Martin; Hill, Sean

2015-01-01

Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity. Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against in vivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100 000 (ABA) and 122 000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists. Availability and implementation: The resulting models are publicly available at github.com/BlueBrain/bluima. Contact: renaud.richardet@epfl.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25609795

A high resolution atlas of gene expression in the domestic sheep (Ovis aries)

PubMed Central

Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.

2017-01-01

Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238
A high resolution atlas of gene expression in the domestic sheep (Ovis aries).

PubMed

Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A

2017-09-01

Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.
The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome

PubMed Central

Dellaire, G.; Farrall, R.; Bickmore, W.A.

2003-01-01

The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. PMID:12520015
The Atlas of human African trypanosomiasis: a contribution to global mapping of neglected tropical diseases

PubMed Central

2010-01-01

Background Following World Health Assembly resolutions 50.36 in 1997 and 56.7 in 2003, the World Health Organization (WHO) committed itself to supporting human African trypanosomiasis (HAT)-endemic countries in their efforts to remove the disease as a public health problem. Mapping the distribution of HAT in time and space has a pivotal role to play if this objective is to be met. For this reason WHO launched the HAT Atlas initiative, jointly implemented with the Food and Agriculture Organization of the United Nations, in the framework of the Programme Against African Trypanosomosis. Results The distribution of HAT is presented for 23 out of 25 sub-Saharan countries having reported on the status of sleeping sickness in the period 2000 - 2009. For the two remaining countries, i.e. Angola and the Democratic Republic of the Congo, data processing is ongoing. Reports by National Sleeping Sickness Control Programmes (NSSCPs), Non-Governmental Organizations (NGOs) and Research Institutes were collated and the relevant epidemiological data were entered in a database, thus incorporating (i) the results of active screening of over 2.2 million people, and (ii) cases detected in health care facilities engaged in passive surveillance. A total of over 42 000 cases of HAT and 6 000 different localities were included in the database. Various sources of geographic coordinates were used to locate the villages of epidemiological interest. The resulting average mapping accuracy is estimated at 900 m. Conclusions Full involvement of NSSCPs, NGOs and Research Institutes in building the Atlas of HAT contributes to the efficiency of the mapping process and it assures both the quality of the collated information and the accuracy of the outputs. Although efforts are still needed to reduce the number of undetected and unreported cases, the comprehensive, village-level mapping of HAT control activities over a ten-year period ensures a detailed and reliable representation of the known geographic distribution of the disease. Not only does the Atlas serve research and advocacy, but, more importantly, it provides crucial evidence and a valuable tool for making informed decisions to plan and monitor the control of sleeping sickness. PMID:21040555
Abasy Atlas: a comprehensive inventory of systems, global network properties and systems-level elements across bacteria.

PubMed

Ibarra-Arellano, Miguel A; Campos-González, Adrián I; Treviño-Quintanilla, Luis G; Tauch, Andreas; Freyre-González, Julio A

2016-01-01

The availability of databases electronically encoding curated regulatory networks and of high-throughput technologies and methods to discover regulatory interactions provides an invaluable source of data to understand the principles underpinning the organization and evolution of these networks responsible for cellular regulation. Nevertheless, data on these sources never goes beyond the regulon level despite the fact that regulatory networks are complex hierarchical-modular structures still challenging our understanding. This brings the necessity for an inventory of systems across a large range of organisms, a key step to rendering feasible comparative systems biology approaches. In this work, we take the first step towards a global understanding of the regulatory networks organization by making a cartography of the functional architectures of diverse bacteria. Abasy ( A: cross- BA: cteria SY: stems) Atlas provides a comprehensive inventory of annotated functional systems, global network properties and systems-level elements (global regulators, modular genes shaping functional systems, basal machinery genes and intermodular genes) predicted by the natural decomposition approach for reconstructed and meta-curated regulatory networks across a large range of bacteria, including pathogenically and biotechnologically relevant organisms. The meta-curation of regulatory datasets provides the most complete and reliable set of regulatory interactions currently available, which can even be projected into subsets by considering the force or weight of evidence supporting them or the systems that they belong to. Besides, Abasy Atlas provides data enabling large-scale comparative systems biology studies aimed at understanding the common principles and particular lifestyle adaptions of systems across bacteria. Abasy Atlas contains systems and system-level elements for 50 regulatory networks comprising 78 649 regulatory interactions covering 42 bacteria in nine taxa, containing 3708 regulons and 1776 systems. All this brings together a large corpus of data that will surely inspire studies to generate hypothesis regarding the principles governing the evolution and organization of systems and the functional architectures controlling them.Database URL: http://abasy.ccg.unam.mx. © The Author(s) 2016. Published by Oxford University Press.
Big Data Tools as Applied to ATLAS Event Data

NASA Astrophysics Data System (ADS)

Vukotic, I.; Gardner, R. W.; Bryant, L. A.

2017-10-01

Big Data technologies have proven to be very useful for storage, processing and visualization of derived metrics associated with ATLAS distributed computing (ADC) services. Logfiles, database records, and metadata from a diversity of systems have been aggregated and indexed to create an analytics platform for ATLAS ADC operations analysis. Dashboards, wide area data access cost metrics, user analysis patterns, and resource utilization efficiency charts are produced flexibly through queries against a powerful analytics cluster. Here we explore whether these techniques and associated analytics ecosystem can be applied to add new modes of open, quick, and pervasive access to ATLAS event data. Such modes would simplify access and broaden the reach of ATLAS public data to new communities of users. An ability to efficiently store, filter, search and deliver ATLAS data at the event and/or sub-event level in a widely supported format would enable or significantly simplify usage of machine learning environments and tools like Spark, Jupyter, R, SciPy, Caffe, TensorFlow, etc. Machine learning challenges such as the Higgs Boson Machine Learning Challenge, the Tracking challenge, Event viewers (VP1, ATLANTIS, ATLASrift), and still to be developed educational and outreach tools would be able to access the data through a simple REST API. In this preliminary investigation we focus on derived xAOD data sets. These are much smaller than the primary xAODs having containers, variables, and events of interest to a particular analysis. Being encouraged with the performance of Elasticsearch for the ADC analytics platform, we developed an algorithm for indexing derived xAOD event data. We have made an appropriate document mapping and have imported a full set of standard model W/Z datasets. We compare the disk space efficiency of this approach to that of standard ROOT files, the performance in simple cut flow type of data analysis, and will present preliminary results on its scaling characteristics with different numbers of clients, query complexity, and size of the data retrieved.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

PubMed

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Monitoring Direct Effects of Delta, Atlas, and Titan Launches from Cape Canaveral Air Station

NASA Technical Reports Server (NTRS)

Schmalzer, Paul A.; Boyle, Shannon R.; Hall, Patrice; Oddy, Donna M.; Hensley, Melissa A.; Stolen, Eric D.; Duncan, Brean W.

1998-01-01

Launches of Delta, Atlas, and Titan rockets from Cape Canaveral Air Station (CCAS) have potential environmental effects that could arise from direct impacts of the launch exhaust (e.g., blast, heat), deposition of exhaust products of the solid rocket motors (hydrogen chloride, aluminum oxide), or other effects such as noise. Here we: 1) review previous reports, environmental assessments, and environmental impact statements for Delta, Atlas, and Titan vehicles and pad areas to clarity the magnitude of potential impacts; 2) summarize observed effects of 15 Delta, 22 Atlas, and 8 Titan launches; and 3) develop a spatial database of the distribution of effects from individual launches and cumulative effects of launches. The review of previous studies indicated that impacts from these launches can occur from the launch exhaust heat, deposition of exhaust products from the solid rocket motors, and noise. The principal effluents from solid rocket motors are hydrogen chloride (HCl), aluminum oxide (Al2O3), water (H2O), hydrogen (H2), carbon monoxide (CO), and carbon dioxide (CO2). The exhaust plume interacts with the launch complex structure and water deluge system to generate a launch cloud. Fall out or rain out of material from this cloud can produce localized effects from acid or particulate deposition. Delta, Atlas, and Titan launch vehicles differ in the number and size of solid rocket boosters and in the amount of deluge water used. All are smaller and use less water than the Space Shuttle. Acid deposition can cause damage to plants and animals exposed to it, acidify surface water and soil, and cause long-term changes to community composition and structure from repeated exposure. The magnitude of these effects depends on the intensity and frequency of acid deposition.
MIPS: a database for genomes and protein sequences

PubMed Central

Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.

PubMed

Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
Integration of gel-based and gel-free proteomic data for functional analysis of proteins through Soybean Proteome Database.

PubMed

Komatsu, Setsuko; Wang, Xin; Yin, Xiaojian; Nanjo, Yohei; Ohyanagi, Hajime; Sakata, Katsumi

2017-06-23

The Soybean Proteome Database (SPD) stores data on soybean proteins obtained with gel-based and gel-free proteomic techniques. The database was constructed to provide information on proteins for functional analyses. The majority of the data is focused on soybean (Glycine max 'Enrei'). The growth and yield of soybean are strongly affected by environmental stresses such as flooding. The database was originally constructed using data on soybean proteins separated by two-dimensional polyacrylamide gel electrophoresis, which is a gel-based proteomic technique. Since 2015, the database has been expanded to incorporate data obtained by label-free mass spectrometry-based quantitative proteomics, which is a gel-free proteomic technique. Here, the portions of the database consisting of gel-free proteomic data are described. The gel-free proteomic database contains 39,212 proteins identified in 63 sample sets, such as temporal and organ-specific samples of soybean plants grown under flooding stress or non-stressed conditions. In addition, data on organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored. Furthermore, the database integrates multiple omics data such as genomics, transcriptomics, metabolomics, and proteomics. The SPD database is accessible at http://proteome.dc.affrc.go.jp/Soybean/. The Soybean Proteome Database stores data obtained from both gel-based and gel-free proteomic techniques. The gel-free proteomic database comprises 39,212 proteins identified in 63 sample sets, such as different organs of soybean plants grown under flooding stress or non-stressed conditions in a time-dependent manner. In addition, organellar proteins identified in mitochondria, nuclei, and endoplasmic reticulum are stored in the gel-free proteomics database. A total of 44,704 proteins, including 5490 proteins identified using a gel-based proteomic technique, are stored in the SPD. It accounts for approximately 80% of all predicted proteins from genome sequences, though there are over lapped proteins. Based on the demonstrated application of data stored in the database for functional analyses, it is suggested that these data will be useful for analyses of biological mechanisms in soybean. Furthermore, coupled with recent advances in information and communication technology, the usefulness of this database would increase in the analyses of biological mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

PubMed

Radauer, Christian

2017-01-01

The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.
Detection of alternative splice variants at the proteome level in Aspergillus flavus.

PubMed

Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C

2010-03-05

Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.
ATLAS, CMS and new challenges for public communication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Taylor, Lucas; Barney, David; Goldfarb, Steven

On 30 March 2010 the first high-energy collisions brought the LHC experiments into the era of research and discovery. Millions of viewers worldwide tuned in to the webcasts and followed the news via Web 2.0 tools, such as blogs, Twitter, and Facebook, with 205,000 unique visitors to CERN's Web site. Media coverage at the experiments and in institutes all over the world yielded more than 2,200 news items including 800 TV broadcasts. We describe the new multimedia communications challenges, due to the massive public interest in the LHC programme, and the corresponding responses of the ATLAS and CMS experiments, inmore » the areas of Web 2.0 tools, multimedia, webcasting, videoconferencing, and collaborative tools. We discuss the strategic convergence of the two experiments' communications services, information systems and public database of outreach material.« less
ATLAS, CMS and New Challenges for Public Communication

NASA Astrophysics Data System (ADS)

Taylor, Lucas; Barney, David; Goldfarb, Steven

2011-12-01

On 30 March 2010 the first high-energy collisions brought the LHC experiments into the era of research and discovery. Millions of viewers worldwide tuned in to the webcasts and followed the news via Web 2.0 tools, such as blogs, Twitter, and Facebook, with 205,000 unique visitors to CERN's Web site. Media coverage at the experiments and in institutes all over the world yielded more than 2,200 news items including 800 TV broadcasts. We describe the new multimedia communications challenges, due to the massive public interest in the LHC programme, and the corresponding responses of the ATLAS and CMS experiments, in the areas of Web 2.0 tools, multimedia, webcasting, videoconferencing, and collaborative tools. We discuss the strategic convergence of the two experiments' communications services, information systems and public database of outreach material.
The Imaging Spectrometric Observatory for the ATLAS 1 mission

NASA Technical Reports Server (NTRS)

Torr, Douglas G.

1995-01-01

The Imaging Spectrometric Observatory (ISO) was flown on the ATLAS 1 mission and was enormously successful, providing a baseline database on the coupled stratospheric, mesospheric, thermospheric, and ionospheric regions. Specific ISO accomplishments include measurements of the hydroxyl radical, studies of the global ionosphere, retrieval of the concentrations of neutral species from the ISO data, studies of mesospheric oxygen emissions, retrieval of mesospheric O from oxygen emissions, studies of the OH Meinel bands and the search for the Herzberg III bands, search for metallic species, studies of thermospheric nitric oxide, auroral study of molecular nitrogen emissions, and studies of thermospheric species. Apart from participation in the data analysis, the primary post-flight responsibility of Marshall Space Flight Center was the delivery of the final post mission dataset. Support provided by the University of Alabama in Huntsville is described.
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

PubMed

Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

2007-10-18

Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.
The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

PubMed Central

Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

2007-01-01

Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017
UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions.

PubMed

Robasky, Kimberly; Bulyk, Martha L

2011-01-01

The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) database is a centralized repository of information on the DNA-binding preferences of proteins as determined by universal protein-binding microarray (PBM) technology. Each entry for a protein (or protein complex) in UniPROBE provides the quantitative preferences for all possible nucleotide sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In this update, we describe >130% expansion of the database content, incorporation of a protein BLAST (blastp) tool for finding protein sequence matches in UniPROBE, the introduction of UniPROBE accession numbers and additional database enhancements. The UniPROBE database is available at http://uniprobe.org.
ATLAS EventIndex general dataflow and monitoring infrastructure

NASA Astrophysics Data System (ADS)

Fernández Casaní, Á.; Barberis, D.; Favareto, A.; García Montoro, C.; González de la Hoz, S.; Hřivnáč, J.; Prokoshin, F.; Salt, J.; Sánchez, J.; Többicke, R.; Yuan, R.; ATLAS Collaboration

2017-10-01

The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast dataset discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure.

Distributed analysis functional testing using GangaRobot in the ATLAS experiment

NASA Astrophysics Data System (ADS)

Legger, Federica; ATLAS Collaboration

2011-12-01

Automated distributed analysis tests are necessary to ensure smooth operations of the ATLAS grid resources. The HammerCloud framework allows for easy definition, submission and monitoring of grid test applications. Both functional and stress test applications can be defined in HammerCloud. Stress tests are large-scale tests meant to verify the behaviour of sites under heavy load. Functional tests are light user applications running at each site with high frequency, to ensure that the site functionalities are available at all times. Success or failure rates of these tests jobs are individually monitored. Test definitions and results are stored in a database and made available to users and site administrators through a web interface. In this work we present the recent developments of the GangaRobot framework. GangaRobot monitors the outcome of functional tests, creates a blacklist of sites failing the tests, and exports the results to the ATLAS Site Status Board (SSB) and to the Service Availability Monitor (SAM), providing on the one hand a fast way to identify systematic or temporary site failures, and on the other hand allowing for an effective distribution of the work load on the available resources.
Analysis of single nucleotide variants of HFE gene and association to survival in The Cancer Genome Atlas GBM data.

PubMed

Lee, Sang Y; Zhu, Junjia; Salzberg, Anna C; Zhang, Bo; Liu, Dajiang J; Muscat, Joshua E; Langan, Sara T; Connor, James R

2017-01-01

Human hemochromatosis protein (HFE) is involved in iron metabolism. Two major HFE polymorphisms, H63D and C282Y, have been associated with an increased risk of cancers. Previously, we reported decreased gender effects in overall survival based on H63D or C282Y HFE polymorphisms patients with glioblastoma multiforme (GBM). However, the effect of other single nucleotide variation (SNV) in the HFE gene on the cancer development and progression has not been systematically studied. To expand our finding in a larger sample, and to identify other HFE SNV, we analyzed the frequency of somatic SNV in HFE gene and its relationship to survival in GBM patients using The Cancer Genome Atlas (TCGA) GBM (Caucasian only) database. We found 9 SNVs with increased frequency in blood normal of TCGA GBM patients compared to the 1000Genome. Among 9 SNVs, 7 SNVs were located in the intron and 2 SNVs (i.e., H63D, C282Y) in the exon of HFE gene. The statistical analysis demonstrated that blood normal samples of TCGA GBM have more H63D (p = 0.0002, 95% Confidence interval (CI): 0.2119-0.3223) or C282Y (p = 0.0129, 95% CI: 0.0474-0.1159) HFE polymorphisms than 1000Genome. The Kaplan-Meier survival curve for the 264 GBM samples revealed no difference between wild type (WT) HFE and H63D, and WT HFE and C282Y GBM patients. In addition, there was no difference in the survival of male/female GBM patients based on HFE genotype. There was no correlation between HFE expression and survival. In conclusion, the current results suggest that somatic HFE polymorphisms do not impact GBM patients' survival in the TCGA data set of GBM.
Unraveling the oral cancer lncRNAome: Identification of novel lncRNAs associated with malignant progression and HPV infection.

PubMed

Nohata, Nijiro; Abba, Martin C; Gutkind, J Silvio

2016-08-01

The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. Copyright © 2016 Elsevier Ltd. All rights reserved.
Unraveling the Oral Cancer lncRNAome: Identification of Novel lncRNAs Associated with Malignant Progression and HPV Infection

PubMed Central

Nohata, Nijiro; Abba, Martin C.; Gutkind, J. Silvio

2017-01-01

Objectives The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. Materials and Methods The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Results Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. Conclusion LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. PMID:27424183
Dentalmaps: Automatic Dental Delineation for Radiotherapy Planning in Head-and-Neck Cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thariat, Juliette, E-mail: jthariat@hotmail.com; Ramus, Liliane; INRIA

Purpose: To propose an automatic atlas-based segmentation framework of the dental structures, called Dentalmaps, and to assess its accuracy and relevance to guide dental care in the context of intensity-modulated radiotherapy. Methods and Materials: A multi-atlas-based segmentation, less sensitive to artifacts than previously published head-and-neck segmentation methods, was used. The manual segmentations of a 21-patient database were first deformed onto the query using nonlinear registrations with the training images and then fused to estimate the consensus segmentation of the query. Results: The framework was evaluated with a leave-one-out protocol. The maximum doses estimated using manual contours were considered as groundmore » truth and compared with the maximum doses estimated using automatic contours. The dose estimation error was within 2-Gy accuracy in 75% of cases (with a median of 0.9 Gy), whereas it was within 2-Gy accuracy in 30% of cases only with the visual estimation method without any contour, which is the routine practice procedure. Conclusions: Dose estimates using this framework were more accurate than visual estimates without dental contour. Dentalmaps represents a useful documentation and communication tool between radiation oncologists and dentists in routine practice. Prospective multicenter assessment is underway on patients extrinsic to the database.« less
Bone Age Assessment of Children using a Digital Hand Atlas

PubMed Central

Gertych, Arkadiusz; Zhang, Aifeng; Sayre, James; Pospiech-Kurkowska, Sylwia; Huang, H.K

2007-01-01

We have developed an automated method to assess bone age of children using a digital hand atlas. The hand Atlas consists of two components. The first component is a database which is comprised of a collection of 1,400 digitized left hand radiographs from evenly distributed normally developed children of Caucasian (CA), Asian (AS), African-American (AA) and Hispanic (HI) origin, male (M) and female (F), ranged from 1 to 18 year old; and relevant patient demographic data along with pediatric radiologists' readings of each radiograph. This data is separate into eight categories: CAM, CAF, AAM, AAF, HIM, HIF, ASM, and ASF. In addition, CAM, AAM, HIM, and ASM are combined as one male category; and CAF, AAF, HIF, and ASF are combined as one female category. The male and female are further combined as the F & M category. The second component is a computer-assisted diagnosis (CAD) module to assess a child bone age based on the collected data. The CAD method is derived from features extracted from seven regions of interest (ROIs): the carpal bone ROI, and six phanlangeal PROIs. The PROIs are six areas including the distal and middle regions of three middle fingers. These features were used to train the eleven category fuzzy classifiers: one for each race and gender, one for the female, one male, and one F & M, to assess the bone age of a child. The digital hand atlas is being integrated with a PACS for validation of clinical use. PMID:17387000
MIPS: a database for protein sequences, homology data and yeast genome information.

PubMed Central

Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

1997-01-01

The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498
NPIDB: Nucleic acid-Protein Interaction DataBase.

PubMed

Kirsanov, Dmitry D; Zanegina, Olga N; Aksianov, Evgeniy A; Spirin, Sergei A; Karyagina, Anna S; Alexeevski, Andrei V

2013-01-01

The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
The PMDB Protein Model Database

PubMed Central

Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

2006-01-01

The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873
PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

PubMed

Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István

2005-01-01

PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.
Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

PubMed Central

Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

2003-01-01

Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355
The Vigna unguiculata Gene Expression Atlas (VuGEA) from de novo assembly and quantification of RNA-seq data provides insights into seed maturation mechanisms.

PubMed

Yao, Shaolun; Jiang, Chuan; Huang, Ziyue; Torres-Jerez, Ivone; Chang, Junil; Zhang, Heng; Udvardi, Michael; Liu, Renyi; Verdier, Jerome

2016-10-01

Legume research and cultivar development are important for sustainable food production, especially of high-protein seed. Thanks to the development of deep-sequencing technologies, crop species have been taken to the front line, even without completion of their genome sequences. Black-eyed pea (Vigna unguiculata) is a legume species widely grown in semi-arid regions, which has high potential to provide stable seed protein production in a broad range of environments, including drought conditions. The black-eyed pea reference genotype has been used to generate a gene expression atlas of the major plant tissues (i.e. leaf, root, stem, flower, pod and seed), with a developmental time series for pods and seeds. From these various organs, 27 cDNA libraries were generated and sequenced, resulting in more than one billion reads. Following filtering, these reads were de novo assembled into 36 529 transcript sequences that were annotated and quantified across the different tissues. A set of 24 866 unique transcript sequences, called Unigenes, was identified. All the information related to transcript identification, annotation and quantification were stored into a gene expression atlas webserver (http://vugea.noble.org), providing a user-friendly interface and necessary tools to analyse transcript expression in black-eyed pea organs and to compare data with other legume species. Using this gene expression atlas, we inferred details of molecular processes that are active during seed development, and identified key putative regulators of seed maturation. Additionally, we found evidence for conservation of regulatory mechanisms involving miRNA in plant tissues subjected to drought and seeds undergoing desiccation. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE Office of Scientific and Technical Information (OSTI.GOV)

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE PAGES

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.

PubMed

Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing

2018-04-06

Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.
Expression of the receptor for hyaluronic acid mediated motility (RHAMM) is associated with poor prognosis and metastasis in non-small cell lung carcinoma

PubMed Central

Azzopardi, Stephanie; Smith, Roger S.; Nasar, Abu; Altorki, Nasser K.; Mittal, Vivek; Somwar, Romel; Stiles, Brendon M.; Du, Yi-Chieh Nancy

2016-01-01

The receptor for hyaluronic acid-mediated motility (RHAMM) is upregulated in various cancers, but its role in primary and metastatic non-small cell lung carcinoma (NSCLC) remains to be determined. Here, we investigate the clinical relevance of RHAMM expression in NSCLC. RHAMM protein expression correlates with histological differentiation stages and extent of the primary tumor (T stages) in 156 patients with primary NSCLC. Importantly, while focal RHAMM staining pattern is present in 57% of primary NSCLC, intense RHAMM protein expression is present in 96% of metastatic NSCLC cases. In a publicly available database, The Cancer Genome Atlas (TCGA), RHAMM mRNA expression is 12- and 10-fold higher in lung adenocarcinoma and squamous lung carcinoma than in matched normal lung tissues, respectively. RHAMM mRNA expression correlates with stages of differentiation and inferior survival in more than 400 cases of lung adenocarcinoma in the Director's Challenge cohort. Of 4 RHAMM splice variants, RHAMMv3 (also known as RHAMMB) is the dominant variant in NSCLC. Moreover, shRNA-mediated knockdown of RHAMM reduced the migratory ability of two lung adenocarcinoma cell lines, H1975 and H3255. Taken together, RHAMM, most likely RHAMMv3 (RHAMMB), can serve as a prognostic factor for lung adenocarcinomas and a potential therapeutic target in NSCLC to inhibit tumor migration. PMID:27220886
Access and use of the GUDMAP database of genitourinary development.

PubMed

Davies, Jamie A; Little, Melissa H; Aronow, Bruce; Armstrong, Jane; Brennan, Jane; Lloyd-MacGilp, Sue; Armit, Chris; Harding, Simon; Piu, Xinjun; Roochun, Yogmatee; Haggarty, Bernard; Houghton, Derek; Davidson, Duncan; Baldock, Richard

2012-01-01

The Genitourinary Development Molecular Atlas Project (GUDMAP) aims to document gene expression across time and space in the developing urogenital system of the mouse, and to provide access to a variety of relevant practical and educational resources. Data come from microarray gene expression profiling (from laser-dissected and FACS-sorted samples) and in situ hybridization at both low (whole-mount) and high (section) resolutions. Data are annotated to a published, high-resolution anatomical ontology and can be accessed using a variety of search interfaces. Here, we explain how to run typical queries on the database, by gene or anatomical location, how to view data, how to perform complex queries, and how to submit data.
Merging in-silico and in vitro salivary protein complex partners using the STRING database: A tutorial.

PubMed

Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz

2018-01-16

Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.
Immunohistochemistry in the Diagnosis of Mucinous Neoplasms Involving the Ovary: The Added Value of SATB2 and Biomarker Discovery Through Protein Expression Database Mining.

PubMed

Strickland, Sarah; Wasserman, Jason K; Giassi, Ana; Djordjevic, Bojana; Parra-Herran, Carlos

2016-05-01

Immunohistochemistry is frequently used to identify ovarian mucinous neoplasms as primary or metastatic; however, there is significant overlap in expression patterns. We compared traditional markers (CK7, CK20, CDX2, PAX8, estrogen receptor, β-catenin, MUC1, MUC2, and MUC5AC) to 2 novel proteins identified through mining of the Human Protein Atlas expression database: SATB2 and POF1B. The study cohort included 49 primary gastrointestinal (GI) mucinous adenocarcinomas (19 colorectal, 15 gastric, 15 pancreatobiliary), 60 primary ovarian mucinous neoplasms (19 cystadenomas, 21 borderline tumors, 20 adenocarcinomas), and 19 metastatic carcinomas to the ovary (14 lower and 5 upper GI primaries). Immunohistochemistry was performed on tissue microarrays, scored and interpreted as negative (absent or focal/weak) or positive. Metastatic tumors were frequently unilateral (42.8% of tumors from lower and 40% of tumors from upper tract) and ≥10 cm (85.7% of tumors from lower and 80% of tumors from upper tract). CK7 was positive in 88.5% upper GI and 88.3% primary ovarian compared with 24.3% lower GI neoplasms. CK20 and CDX2 were positive in 84.8% and 100% of lower GI tumors, respectively; however, expression was also common in upper GI (CK20 42.8%, CDX2 50%) and primary ovarian neoplasms (CK20 65.7%, CDX2 38.3%). Conversely, SATB2 was more specific for lower GI origin, being positive in 78.8% lower GI but only 11.5% upper GI and 1.7% primary ovarian neoplasms. PAX8 expression was common in primary ovarian neoplasms (75% of all neoplasms, 65% of carcinomas); only 1 (1.5%) GI tumor was positive. MUC2 and β-catenin were frequently positive in lower GI tumors (96.9% and 51.5%, respectively). Estrogen receptor expression was only seen in primary ovarian neoplasms (13.3%). Nuclear premature ovarian failure 1B (POF1B) expression was seen in malignant tumors regardless of their origin. A panel including CK7, SATB2, and PAX8 separated primary from secondary GI neoplasms with up to 77.1% sensitivity and 99% specificity, outperforming tumor laterality and size. Second-line markers such as CDX2, MUC2, estrogen receptor, MUC1, and β-catenin increased the sensitivity of immunohistochemistry in excluding lower GI origin. Biomarker search using proteomic databases has a value in diagnostic pathology, as shown with SATB2; however, as seen with POF1B, expression profiles in these databases are not always reproduced in larger cohorts.

The Muon Conditions Data Management:. Database Architecture and Software Infrastructure

NASA Astrophysics Data System (ADS)

Verducci, Monica

2010-04-01

The management of the Muon Conditions Database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored and their analysis. The Muon conditions database is responsible for almost all of the 'non-event' data and detector quality flags storage needed for debugging of the detector operations and for performing the reconstruction and the analysis. In particular for the early data, the knowledge of the detector performance, the corrections in term of efficiency and calibration will be extremely important for the correct reconstruction of the events. In this work, an overview of the entire Muon conditions database architecture is given, in particular the different sources of the data and the storage model used, including the database technology associated. Particular emphasis is given to the Data Quality chain: the flow of the data, the analysis and the final results are described. In addition, the description of the software interfaces used to access to the conditions data are reported, in particular, in the ATLAS Offline Reconstruction framework ATHENA environment.
A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE).

PubMed

Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja

2014-01-01

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu.
EU-Norsewind Using Envisat ASAR And Other Data For Offshore Wind Atlas

NASA Astrophysics Data System (ADS)

Hasager, Charlotte B.; Mouche, Alexis; Badger, Merete

2010-04-01

The EU project NORSEWIND - short for Northern Seas Wind Index Database - www.norsewind.eu has the aim to produce state-of-the-art wind atlas for the Baltic, Irish and North Seas using ground-based lidar, meteorological masts, satellite data and mesoscale modelling. So far CLS and Risø DTU have collected Envisat ASAR images for the area of interest and the first results: maps of wind statistics, Weibull scale and shape parameters, mean and energy density are presented. The results will be compared to a distributed network of high-quality in-situ observations and mesoscale model results during 2009-2011 as the in-situ data and model results become available. Wind energy is proportional with wind speed to the third power, thus even small improvements on wind speed mapping are important in this project. One challenge is to arrive at hub-height winds ~100 m above sea level.
Building quantitative, three-dimensional atlases of gene expression and morphology at cellular resolution.

PubMed

Knowles, David W; Biggin, Mark D

2013-01-01

Animals comprise dynamic three-dimensional arrays of cells that express gene products in intricate spatial and temporal patterns that determine cellular differentiation and morphogenesis. A rigorous understanding of these developmental processes requires automated methods that quantitatively record and analyze complex morphologies and their associated patterns of gene expression at cellular resolution. Here we summarize light microscopy-based approaches to establish permanent, quantitative datasets-atlases-that record this information. We focus on experiments that capture data for whole embryos or large areas of tissue in three dimensions, often at multiple time points. We compare and contrast the advantages and limitations of different methods and highlight some of the discoveries made. We emphasize the need for interdisciplinary collaborations and integrated experimental pipelines that link sample preparation, image acquisition, image analysis, database design, visualization, and quantitative analysis. Copyright © 2013 Wiley Periodicals, Inc.
ATLAS offline data quality monitoring

NASA Astrophysics Data System (ADS)

Adelman, J.; Baak, M.; Boelaert, N.; D'Onofrio, M.; Frost, J. A.; Guyot, C.; Hauschild, M.; Hoecker, A.; Leney, K. J. C.; Lytken, E.; Martinez-Perez, M.; Masik, J.; Nairz, A. M.; Onyisi, P. U. E.; Roe, S.; Schaetzel, S.; Wilson, M. G.

2010-04-01

The ATLAS experiment at the Large Hadron Collider reads out 100 Million electronic channels at a rate of 200 Hz. Before the data are shipped to storage and analysis centres across the world, they have to be checked to be free from irregularities which render them scientifically useless. Data quality offline monitoring provides prompt feedback from full first-pass event reconstruction at the Tier-0 computing centre and can unveil problems in the detector hardware and in the data processing chain. Detector information and reconstructed proton-proton collision event characteristics are distilled into a few key histograms and numbers which are automatically compared with a reference. The results of the comparisons are saved as status flags in a database and are published together with the histograms on a web server. They are inspected by a 24/7 shift crew who can notify on-call experts in case of problems and in extreme cases signal data taking abort.
Profiling RNA editing in human tissues: towards the inosinome Atlas

PubMed Central

Picardi, Ernesto; Manzari, Caterina; Mastropasqua, Francesca; Aiello, Italia; D’Erchia, Anna Maria; Pesole, Graziano

2015-01-01

Adenine to Inosine RNA editing is a widespread co- and post-transcriptional mechanism mediated by ADAR enzymes acting on double stranded RNA. It has a plethora of biological effects, appears to be particularly pervasive in humans with respect to other mammals, and is implicated in a number of diverse human pathologies. Here we present the first human inosinome atlas comprising 3,041,422 A-to-I events identified in six tissues from three healthy individuals. Matched directional total-RNA-Seq and whole genome sequence datasets were generated and analysed within a dedicated computational framework, also capable of detecting hyper-edited reads. Inosinome profiles are tissue specific and edited gene sets consistently show enrichment of genes involved in neurological disorders and cancer. Overall frequency of editing also varies, but is strongly correlated with ADAR expression levels. The inosinome database is available at: http://srv00.ibbe.cnr.it/editing/. PMID:26449202
Development of noSQL data storage for the ATLAS PanDA Monitoring System

NASA Astrophysics Data System (ADS)

Potekhin, M.; ATLAS Collaboration

2012-06-01

For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges are being met with a R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques used for efficient indexing of the data. We also discuss the hardware requirements as they were determined by testing with actual data and realistic loads.
Far infrared supplement. Third edition: Catalog of infrared observations (lambda greater than or equal to 4.6 micrometers)

NASA Technical Reports Server (NTRS)

Gezari, Daniel Y.; Schmitz, Marion; Pitts, Patricia S.; Mead, Jaylee M.

1993-01-01

The Far Infrared Supplement contains a subset of the data in the full Catalog of Infrared Observations (all observations at wavelengths greater than 4.6 microns). The Catalog of Infrared Observations (CIO), NASA RP-1294, is a compilation of infrared astronomical observational data obtained from an extensive literature search of scientific journals and major astronomical catalogs and surveys. The literature search is complete for years 1965 through 1990 in this third edition. The catalog contains about 210,000 observations of roughly 20,000 individual sources, and supporting appendices. The expanded third edition contains coded IRAS 4-band data for all CIO sources detected by IRAS. The appendices include an atlas of infrared source positions (also included in this volume), two bibliographies of catalog listings, and an atlas of infrared spectral ranges. The complete CIO database is available to qualified users in printed, microfiche, and magnetic tape formats.
Catalog of Infrared Observations, Third Edition

NASA Technical Reports Server (NTRS)

Gezari, Daniel Y.; Schmitz, Marion; Pitts, Patricia S.; Mead, Jaylee M.

1993-01-01

The Far Infrared Supplement contains a subset of the data in the full Catalog of Infrared Observations (all observations at wavelengths greater than 4.6 microns). The Catalog of Infrared Observations (CIO), NASA RP-1294, is a compilation of infrared astronomical observational data obtained from an extensive literature search of scientific journals and major astronomical catalogs and surveys. The literature search is complete for years 1965 through 1990 in this Third Edition. The Catalog contains about 210,000 observations of roughly 20,000 individual sources and supporting appendices. The expanded Third Edition contains coded IRAS 4-band data for all CIO sources detected by IRAS. The appendices include an atlas of infrared source positions (also included in this volume), two bibliographies of Catalog listings, and an atlas of infrared spectral ranges. The complete CIO database is available to qualified users in printed, microfiche, and magnetic-tape formats.
Human cell structure-driven model construction for predicting protein subcellular location from biological images.

PubMed

Shao, Wei; Liu, Mingxia; Zhang, Daoqiang

2016-01-01

The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
TRIM65 negatively regulates p53 through ubiquitination

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Yang; Ma, Chengyuan; Zhou, Tong

2016-04-22

Tripartite-motif protein family member 65 (TRIM65) is an important protein involved in white matter lesion. However, the role of TRIM65 in human cancer remains less understood. Through the Cancer Genome Atlas (TCGA) gene alteration database, we found that TRIM65 is upregulated in a significant portion of non-small cell lung carcinoma (NSCLC) patients. Our cell growth assay revealed that TRIM65 overexpression promotes cell proliferation, while knockdown of TRIM65 displays opposite effect. Mechanistically, TRIM65 binds to p53, one of the most critical tumor suppressors, and serves as an E3 ligase toward p53. Consequently, TRIM65 inactivates p53 through facilitating p53 poly-ubiquitination and proteasome-mediatedmore » degradation. Notably, chemotherapeutic reagent cisplatin induction of p53 is markedly attenuated in response to ectopic expression of TRIM65. Cell growth inhibition by TRIM65 knockdown is more significant in p53 positive H460 than p53 negative H1299 cells, and knockdown of p53 in H460 cells also shows compromised cell growth inhibition by TRIM65 knockdown, indicating that p53 is required, at least in part, for TRIM65 function. Our findings demonstrate TRIM65 as a potential oncogenic protein, highly likely through p53 inactivation, and provide insight into development of novel approaches targeting TRIM65 for NSCLC treatment, and also overcoming chemotherapy resistance. - Highlights: • TRIM65 expression is elevated in NSCLC. • TRIM65 inactivates p53 through mediating p53 ubiquitination and degradation. • TRIM65 attenuates the response of NSCLC cells to cisplatin.« less
The Protein Disease Database of human body fluids: II. Computer methods and data issues.

PubMed

Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

1995-01-01

The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases

PubMed Central

Truong, Kevin; Ikura, Mitsuhiko

2003-01-01

Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
TOPDOM: database of conservatively located domains and motifs in proteins.

PubMed

Varga, Julia; Dobson, László; Tusnády, Gábor E

2016-09-01

The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. tusnady.gabor@ttk.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

PubMed

Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

2016-01-04

The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Atlas of ShakeMaps and population exposure catalog for earthquake loss modeling

USGS Publications Warehouse

Allen, T.I.; Wald, D.J.; Earle, P.S.; Marano, K.D.; Hotovec, A.J.; Lin, K.; Hearne, M.G.

2009-01-01

We present an Atlas of ShakeMaps and a catalog of human population exposures to moderate-to-strong ground shaking (EXPO-CAT) for recent historical earthquakes (1973-2007). The common purpose of the Atlas and exposure catalog is to calibrate earthquake loss models to be used in the US Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER). The full ShakeMap Atlas currently comprises over 5,600 earthquakes from January 1973 through December 2007, with almost 500 of these maps constrained-to varying degrees-by instrumental ground motions, macroseismic intensity data, community internet intensity observations, and published earthquake rupture models. The catalog of human exposures is derived using current PAGER methodologies. Exposure to discrete levels of shaking intensity is obtained by correlating Atlas ShakeMaps with a global population database. Combining this population exposure dataset with historical earthquake loss data, such as PAGER-CAT, provides a useful resource for calibrating loss methodologies against a systematically-derived set of ShakeMap hazard outputs. We illustrate two example uses for EXPO-CAT; (1) simple objective ranking of country vulnerability to earthquakes, and; (2) the influence of time-of-day on earthquake mortality. In general, we observe that countries in similar geographic regions with similar construction practices tend to cluster spatially in terms of relative vulnerability. We also find little quantitative evidence to suggest that time-of-day is a significant factor in earthquake mortality. Moreover, earthquake mortality appears to be more systematically linked to the population exposed to severe ground shaking (Modified Mercalli Intensity VIII+). Finally, equipped with the full Atlas of ShakeMaps, we merge each of these maps and find the maximum estimated peak ground acceleration at any grid point in the world for the past 35 years. We subsequently compare this "composite ShakeMap" with existing global hazard models, calculating the spatial area of the existing hazard maps exceeded by the combined ShakeMap ground motions. In general, these analyses suggest that existing global, and regional, hazard maps tend to overestimate hazard. Both the Atlas of ShakeMaps and EXPO-CAT have many potential uses for examining earthquake risk and epidemiology. All of the datasets discussed herein are available for download on the PAGER Web page ( http://earthquake.usgs.gov/ eqcenter/pager/prodandref/ ). ?? 2009 Springer Science+Business Media B.V.
Brain templates and atlases.

PubMed

Evans, Alan C; Janke, Andrew L; Collins, D Louis; Baillet, Sylvain

2012-08-15

The core concept within the field of brain mapping is the use of a standardized, or "stereotaxic", 3D coordinate frame for data analysis and reporting of findings from neuroimaging experiments. This simple construct allows brain researchers to combine data from many subjects such that group-averaged signals, be they structural or functional, can be detected above the background noise that would swamp subtle signals from any single subject. Where the signal is robust enough to be detected in individuals, it allows for the exploration of inter-individual variance in the location of that signal. From a larger perspective, it provides a powerful medium for comparison and/or combination of brain mapping findings from different imaging modalities and laboratories around the world. Finally, it provides a framework for the creation of large-scale neuroimaging databases or "atlases" that capture the population mean and variance in anatomical or physiological metrics as a function of age or disease. However, while the above benefits are not in question at first order, there are a number of conceptual and practical challenges that introduce second-order incompatibilities among experimental data. Stereotaxic mapping requires two basic components: (i) the specification of the 3D stereotaxic coordinate space, and (ii) a mapping function that transforms a 3D brain image from "native" space, i.e. the coordinate frame of the scanner at data acquisition, to that stereotaxic space. The first component is usually expressed by the choice of a representative 3D MR image that serves as target "template" or atlas. The native image is re-sampled from native to stereotaxic space under the mapping function that may have few or many degrees of freedom, depending upon the experimental design. The optimal choice of atlas template and mapping function depend upon considerations of age, gender, hemispheric asymmetry, anatomical correspondence, spatial normalization methodology and disease-specificity. Accounting, or not, for these various factors in defining stereotaxic space has created the specter of an ever-expanding set of atlases, customized for a particular experiment, that are mutually incompatible. These difficulties continue to plague the brain mapping field. This review article summarizes the evolution of stereotaxic space in term of the basic principles and associated conceptual challenges, the creation of population atlases and the future trends that can be expected in atlas evolution. Copyright © 2012 Elsevier Inc. All rights reserved.
The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

PubMed

Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf

2004-02-01

A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
Teaching resources for dermatology on the WWW--quiz system and dynamic lecture scripts using a HTTP-database demon.

PubMed Central

Bittorf, A.; Diepgen, T. L.

1996-01-01

The World Wide Web (WWW) is becoming the major way of acquiring information in all scientific disciplines as well as in business. It is very well suitable for fast distribution and exchange of up to date teaching resources. However, to date most teaching applications on the Web do not use its full power by integrating interactive components. We have set up a computer based training (CBT) framework for Dermatology, which consists of dynamic lecture scripts, case reports, an atlas and a quiz system. All these components heavily rely on an underlying image database that permits the creation of dynamic documents. We used a demon process that keeps the database open and can be accessed using HTTP to achieve better performance and avoid the overhead involved by starting CGI-processes. The result of our evaluation was very encouraging. Images Figure 3 PMID:8947625
Failure Atlas for Rolling Bearings in Wind Turbines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tallian, T. E.

2006-01-01

This Atlas is structured as a supplement to the book: T.E. Tallian: Failure Atlas for Hertz Contact Machine Elements, 2nd edition, ASME Press New York, (1999). The content of the atlas comprises plate pages from the book that contain bearing failure images, application data, and descriptions of failure mode, image, and suspected failure causes. Rolling bearings are a critical component of the mainshaft system, gearbox and generator in the rapidly developing technology of power generating wind turbines. The demands for long service life are stringent; the design load, speed and temperature regimes are demanding and the environmental conditions including weather,more » contamination, impediments to monitoring and maintenance are often unfavorable. As a result, experience has shown that the rolling bearings are prone to a variety of failure modes that may prevent achievement of design lives. Morphological failure diagnosis is extensively used in the failure analysis and improvement of bearing operation. Accumulated experience shows that the failure appearance and mode of failure causation in wind turbine bearings has many distinguishing features. The present Atlas is a first effort to collect an interpreted database of specifically wind turbine related rolling bearing failures and make it widely available. This Atlas is structured as a supplement to the book: T. E. Tallian: Failure Atlas for Hertz Contact Machine Elements, 2d edition, ASME Press New York, (1999). The main body of that book is a comprehensive collection of self-contained pages called Plates, containing failure images, bearing and application data, and three descriptions: failure mode, image and suspected failure causes. The Plates are sorted by main failure mode into chapters. Each chapter is preceded by a general technical discussion of the failure mode, its appearance and causes. The Plates part is supplemented by an introductory part, describing the appearance classification and failure classification systems used, and by several indexes. The present Atlas is intended as a supplement to the book. It has the same structure but contains only Plate pages, arranged in chapters, each with a chapter heading page giving a short definition of the failure mode illustrated. Each Plate page is self contained, with images, bearing and application data, and descriptions of the failure mode, the images and the suspected causes. Images are provided in two resolutions: The text page includes 6 by 9 cm images. In addition, high resolution image files are attached, to be retrieved by clicking on their 'push pin' icon. While the material in the present Atlas is self-contained, it is nonetheless a supplement to the book and the complete interpretation of the terse image descriptions and of the system underlying the failure code presupposes familiarity with the book. Since this Atlas is a supplement to the book, its chapter numbering follows that of the book. Not all failure modes covered in the book have been found among the observed wind turbines. For that reason, and because of the omission of introductory matter, the chapter numbers in this Atlas are not a continuous sequence.« less

A Circular Dichroism Reference Database for Membrane Proteins

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wallace,B.; Wien, F.; Stone, T.

2006-01-01

Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Tandem mass spectrometry for the detection of plant pathogenic fungi and the effects of database composition on protein inferences.

PubMed

Padliya, Neerav D; Garrett, Wesley M; Campbell, Kimberly B; Tabb, David L; Cooper, Bret

2007-11-01

LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.
PROFESS: a PROtein Function, Evolution, Structure and Sequence database

PubMed Central

Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

2010-01-01

The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
SynechoNET: integrated protein-protein interaction database of a model cyanobacterium Synechocystis sp. PCC 6803.

PubMed

Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon

2008-01-01

Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.
Advanced Technology Lifecycle Analysis System (ATLAS)

NASA Technical Reports Server (NTRS)

O'Neil, Daniel A.; Mankins, John C.

2004-01-01

Developing credible mass and cost estimates for space exploration and development architectures require multidisciplinary analysis based on physics calculations, and parametric estimates derived from historical systems. Within the National Aeronautics and Space Administration (NASA), concurrent engineering environment (CEE) activities integrate discipline oriented analysis tools through a computer network and accumulate the results of a multidisciplinary analysis team via a centralized database or spreadsheet Each minute of a design and analysis study within a concurrent engineering environment is expensive due the size of the team and supporting equipment The Advanced Technology Lifecycle Analysis System (ATLAS) reduces the cost of architecture analysis by capturing the knowledge of discipline experts into system oriented spreadsheet models. A framework with a user interface presents a library of system models to an architecture analyst. The analyst selects models of launchers, in-space transportation systems, and excursion vehicles, as well as space and surface infrastructure such as propellant depots, habitats, and solar power satellites. After assembling the architecture from the selected models, the analyst can create a campaign comprised of missions spanning several years. The ATLAS controller passes analyst specified parameters to the models and data among the models. An integrator workbook calls a history based parametric analysis cost model to determine the costs. Also, the integrator estimates the flight rates, launched masses, and architecture benefits over the years of the campaign. An accumulator workbook presents the analytical results in a series of bar graphs. In no way does ATLAS compete with a CEE; instead, ATLAS complements a CEE by ensuring that the time of the experts is well spent Using ATLAS, an architecture analyst can perform technology sensitivity analysis, study many scenarios, and see the impact of design decisions. When the analyst is satisfied with the system configurations, technology portfolios, and deployment strategies, he or she can present the concepts to a team, which will conduct a detailed, discipline-oriented analysis within a CEE. An analog to this approach is the music industry where a songwriter creates the lyrics and music before entering a recording studio.
PACSY, a relational database management system for protein structure and chemical shift analysis.

PubMed

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

2012-10-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
MIPS: a database for protein sequences and complete genomes.

PubMed Central

Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

1998-01-01

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
HypoxiaDB: a database of hypoxia-regulated proteins

PubMed Central

Khurana, Pankaj; Sugadev, Ragumani; Jain, Jaspreet; Singh, Shashi Bala

2013-01-01

There has been intense interest in the cellular response to hypoxia, and a large number of differentially expressed proteins have been identified through various high-throughput experiments. These valuable data are scattered, and there have been no systematic attempts to document the various proteins regulated by hypoxia. Compilation, curation and annotation of these data are important in deciphering their role in hypoxia and hypoxia-related disorders. Therefore, we have compiled HypoxiaDB, a database of hypoxia-regulated proteins. It is a comprehensive, manually-curated, non-redundant catalog of proteins whose expressions are shown experimentally to be altered at different levels and durations of hypoxia. The database currently contains 72 000 manually curated entries taken on 3500 proteins extracted from 73 peer-reviewed publications selected from PubMed. HypoxiaDB is distinctive from other generalized databases: (i) it compiles tissue-specific protein expression changes under different levels and duration of hypoxia. Also, it provides manually curated literature references to support the inclusion of the protein in the database and establish its association with hypoxia. (ii) For each protein, HypoxiaDB integrates data on gene ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, protein–protein interactions, protein family (Pfam), OMIM (Online Mendelian Inheritance in Man), PDB (Protein Data Bank) structures and homology to other sequenced genomes. (iii) It also provides pre-compiled information on hypoxia-proteins, which otherwise requires tedious computational analysis. This includes information like chromosomal location, identifiers like Entrez, HGNC, Unigene, Uniprot, Ensembl, Vega, GI numbers and Genbank accession numbers associated with the protein. These are further cross-linked to respective public databases augmenting HypoxiaDB to the external repositories. (iv) In addition, HypoxiaDB provides an online sequence-similarity search tool for users to compare their protein sequences with HypoxiaDB protein database. We hope that HypoxiaDB will enrich our knowledge about hypoxia-related biology and eventually will lead to the development of novel hypothesis and advancements in diagnostic and therapeutic activities. HypoxiaDB is freely accessible for academic and non-profit users via http://www.hypoxiadb.com. Database URL: http://www.hypoxiadb.com PMID:24178989
ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.

PubMed

Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala

2017-01-01

Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.
Rice proteome database: a step toward functional analysis of the rice genome.

PubMed

Komatsu, Setsuko

2005-09-01

The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a beta-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.
In silico analysis of fragile histidine triad involved in regression of carcinoma.

PubMed

Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia

2017-04-01

Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.
Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

PubMed Central

Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

2013-01-01

The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545
EnviroAtlas - Synthetic N fertilizer application to agricultural lands by 12-digit HUC in the Conterminous United States, 2006

EPA Pesticide Factsheets

This EnviroAtlas dataset contains data on the mean synthetic nitrogen (N) fertilizer application to cultivated crop and hay/pasture lands per 12-digit Hydrologic Unit (HUC) in 2006. Synthetic N fertilizer inputs in 2006 were estimated using county-level estimates of farm N fertilizer inputs. We acquired county-level data describing total farm-level inputs (kg N/yr) of synthetic N fertilizer to individual counties in 2006 from the United States Geological Survey (USGS) (http://pubs.usgs.gov/sir/2012/5207/). These data were converted to per area rates (kg N/ha/yr) of synthetic N fertilizer application by dividing the total N input by the land area (ha) of combined cultivated crop and hay/pasture lands within a county as determined from county-level (http://cta.ornl.gov/transnet/Boundaries.html) summarization of the 2006 National Land Cover Database (NLCD; http://www.mrlc.gov/nlcd06_data.php). We distributed county-specific, annual per area N inputs rates (kg N/ha/yr) to cultivated crop and hay/pasture lands (30 x 30 m pixels) within the corresponding county using the raster calculator tool in ArcMap 10.0 (ESRI, Inc., Redlands, CA). Fertilizer data described here represent an average input to a typical agricultural land type within a county, i.e., they are not specific to individual crop types. This dataset was produced by the US EPA to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the us
The Protein-DNA Interface database

PubMed Central

2010-01-01

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798
The Protein-DNA Interface database.

PubMed

Norambuena, Tomás; Melo, Francisco

2010-05-18

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.
The Universal Protein Resource (UniProt): an expanding universe of protein information.

PubMed

Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

2006-01-01

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

PubMed

Truong, Kevin; Ikura, Mitsuhiko

2003-05-06

Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
Protein Bioinformatics Databases and Resources

PubMed Central

Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.

2017-01-01

Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231
Renal cell tumors with clear cell histology and intact VHL and chromosome 3p: a histological review of tumors from the Cancer Genome Atlas database.

PubMed

Favazza, Laura; Chitale, Dhananjay A; Barod, Ravi; Rogers, Craig G; Kalyana-Sundaram, Shanker; Palanisamy, Nallasivam; Gupta, Nilesh S; Williamson, Sean R

2017-11-01

Clear cell renal cell carcinoma is by far the most common form of kidney cancer; however, a number of histologically similar tumors are now recognized and considered distinct entities. The Cancer Genome Atlas published data set was queried (http://cbioportal.org) for clear cell renal cell carcinoma tumors lacking VHL gene mutation and chromosome 3p loss, for which whole-slide images were reviewed. Of the 418 tumors in the published Cancer Genome Atlas clear cell renal cell carcinoma database, 387 had VHL mutation, copy number loss for chromosome 3p, or both (93%). Of the remaining, 27/31 had whole-slide images for review. One had 3p loss based on karyotype but not sequencing, and three demonstrated VHL promoter hypermethylation. Nine could be reclassified as distinct or emerging entities: translocation renal cell carcinoma (n=3), TCEB1 mutant renal cell carcinoma (n=3), papillary renal cell carcinoma (n=2), and clear cell papillary renal cell carcinoma (n=1). Of the remaining, 6 had other clear cell renal cell carcinoma-associated gene alterations (PBRM1, SMARCA4, BAP1, SETD2), leaving 11 specimens, including 2 high-grade or sarcomatoid renal cell carcinomas and 2 with prominent fibromuscular stroma (not TCEB1 mutant). One of the remaining tumors exhibited gain of chromosome 7 but lacked histological features of papillary renal cell carcinoma. Two tumors previously reported to harbor TFE3 gene fusions also exhibited VHL mutation, chromosome 3p loss, and morphology indistinguishable from clear cell renal cell carcinoma, the significance of which is uncertain. In summary, almost all clear cell renal cell carcinomas harbor VHL mutation, 3p copy number loss, or both. Of tumors with clear cell histology that lack these alterations, a subset can now be reclassified as other entities. Further study will determine whether additional entities exist, based on distinct genetic pathways that may have implications for treatment.
PACSY, a relational database management system for protein structure and chemical shift analysis

PubMed Central

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo

2012-01-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636

A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

PubMed

Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

2010-08-01

The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
THGS: a web-based database of Transmembrane Helices in Genome Sequences

PubMed Central

Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

2004-01-01

Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Sys-BodyFluid: a systematical database for human body fluid proteome research

PubMed Central

Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

2009-01-01

Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10 000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/. PMID:18978022
Sys-BodyFluid: a systematical database for human body fluid proteome research.

PubMed

Li, Su-Jun; Peng, Mao; Li, Hong; Liu, Bo-Shu; Wang, Chuan; Wu, Jia-Rui; Li, Yi-Xue; Zeng, Rong

2009-01-01

Recently, body fluids have widely become an important target for proteomic research and proteomic study has produced more and more body fluid related protein data. A database is needed to collect and analyze these proteome data. Thus, we developed this web-based body fluid proteome database Sys-BodyFluid. It contains eleven kinds of body fluid proteomes, including plasma/serum, urine, cerebrospinal fluid, saliva, bronchoalveolar lavage fluid, synovial fluid, nipple aspirate fluid, tear fluid, seminal fluid, human milk and amniotic fluid. Over 10,000 proteins are presented in the Sys-BodyFluid. Sys-BodyFluid provides the detailed protein annotations, including protein description, Gene Ontology, domain information, protein sequence and involved pathways. These proteome data can be retrieved by using protein name, protein accession number and sequence similarity. In addition, users can query between these different body fluids to get the different proteins identification information. Sys-BodyFluid database can facilitate the body fluid proteomics and disease proteomics research as a reference database. It is available at http://www.biosino.org/bodyfluid/.
Digital Single-Cell Analysis of Plant Organ Development Using 3DCellAtlas[OPEN

PubMed Central

Montenegro-Johnson, Thomas D.; Stamm, Petra; Strauss, Soeren; Topham, Alexander T.; Tsagris, Michail; Wood, Andrew T.A.; Smith, Richard S.; Bassel, George W.

2015-01-01

Diverse molecular networks underlying plant growth and development are rapidly being uncovered. Integrating these data into the spatial and temporal context of dynamic organ growth remains a technical challenge. We developed 3DCellAtlas, an integrative computational pipeline that semiautomatically identifies cell types and quantifies both 3D cellular anisotropy and reporter abundance at single-cell resolution across whole plant organs. Cell identification is no less than 97.8% accurate and does not require transgenic lineage markers or reference atlases. Cell positions within organs are defined using an internal indexing system generating cellular level organ atlases where data from multiple samples can be integrated. Using this approach, we quantified the organ-wide cell-type-specific 3D cellular anisotropy driving Arabidopsis thaliana hypocotyl elongation. The impact ethylene has on hypocotyl 3D cell anisotropy identified the preferential growth of endodermis in response to this hormone. The spatiotemporal dynamics of the endogenous DELLA protein RGA, expansin gene EXPA3, and cell expansion was quantified within distinct cell types of Arabidopsis roots. A significant regulatory relationship between RGA, EXPA3, and growth was present in the epidermis and endodermis. The use of single-cell analyses of plant development enables the dynamics of diverse regulatory networks to be integrated with 3D organ growth. PMID:25901089
Genome-wide analysis of endogenously expressed ZEB2 binding sites reveals inverse correlations between ZEB2 and GalNAc-transferase GALNT3 in human tumors.

PubMed

Balcik-Ercin, Pelin; Cetin, Metin; Yalim-Camci, Irem; Odabas, Gorkem; Tokay, Nurettin; Sayan, A Emre; Yagci, Tamer

2018-03-07

ZEB2 is a transcriptional repressor that regulates epithelial-to-mesenchymal transition (EMT) through binding to bipartite E-box motifs in gene regulatory regions. Despite the abundant presence of E-boxes within the human genome and the multiplicity of pathophysiological processes regulated during ZEB2-induced EMT, only a small fraction of ZEB2 targets has been identified so far. Hence, we explored genome-wide ZEB2 binding by chromatin immunoprecipitation-sequencing (ChIP-seq) under endogenous ZEB2 expression conditions. For ChIP-Seq we used an anti-ZEB2 monoclonal antibody, clone 6E5, in SNU398 hepatocellular carcinoma cells exhibiting a high endogenous ZEB2 expression. The ChIP-Seq targets were validated using ChIP-qPCR, whereas ZEB2-dependent expression of target genes was assessed by RT-qPCR and Western blotting in shRNA-mediated ZEB2 silenced SNU398 cells and doxycycline-induced ZEB2 overexpressing colorectal carcinoma DLD1 cells. Changes in target gene expression were also assessed using primary human tumor cDNA arrays in conjunction with RT-qPCR. Additional differential expression and correlation analyses were performed using expO and Human Protein Atlas datasets. Over 500 ChIP-Seq positive genes were annotated, and intervals related to these genes were found to include the ZEB2 binding motif CACCTG according to TOMTOM motif analysis in the MEME Suite database. Assessment of ZEB2-dependent expression of target genes in ZEB2-silenced SNU398 cells and ZEB2-induced DLD1 cells revealed that the GALNT3 gene serves as a ZEB2 target with the highest, but inversely correlated, expression level. Remarkably, GALNT3 also exhibited the highest enrichment in the ChIP-qPCR validation assays. Through the analyses of primary tumor cDNA arrays and expO datasets a significant differential expression and a significant inverse correlation between ZEB2 and GALNT3 expression were detected in most of the tumors. We also explored ZEB2 and GALNT3 protein expression using the Human Protein Atlas dataset and, again, observed an inverse correlation in all analyzed tumor types, except malignant melanoma. In contrast to a generally negative or weak ZEB2 expression, we found that most tumor tissues exhibited a strong or moderate GALNT3 expression. Our observation that ZEB2 negatively regulates a GalNAc-transferase (GALNT3) that is involved in O-glycosylation adds another layer of complexity to the role of ZEB2 in cancer progression and metastasis. Proteins glycosylated by GALNT3 may be exploited as novel diagnostics and/or therapeutic targets.
PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

PubMed Central

Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

2007-01-01

Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328
The 2015 Nucleic Acids Research Database Issue and molecular biology database collection.

PubMed

Galperin, Michael Y; Rigden, Daniel J; Fernández-Suárez, Xosé M

2015-01-01

The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of 'moonlighting' proteins, and two new databases of protein-protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/). Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
The Histone Database: an integrated resource for histones and histone fold-containing proteins

PubMed Central

Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

2011-01-01

Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

PubMed

Takashima, S; Yamaoka, K

1999-08-30

Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.
MoonProt: a database for proteins that are known to moonlight

PubMed Central

Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

2015-01-01

Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305
A comprehensive and scalable database search system for metaproteomics.

PubMed

Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

2016-08-16

Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.
Distributed data collection for a database of radiological image interpretations

NASA Astrophysics Data System (ADS)

Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.

1997-01-01

The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
A comprehensive collection of systems biology data characterizing the host response to viral infection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aevermann, Brian D.; Pickett, Brett E.; Kumar, Sanjeev

The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archivedmore » at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). As a result, by comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection.« less
A comprehensive collection of systems biology data characterizing the host response to viral infection

DOE PAGES

Aevermann, Brian D.; Pickett, Brett E.; Kumar, Sanjeev; ...

2014-10-14

The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archivedmore » at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). As a result, by comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection.« less
Gene expression and mutation-guided synthetic lethality eradicates proliferating and quiescent leukemia cells

PubMed Central

Nieborowska-Skorska, Margaret; Sullivan, Katherine; Dasgupta, Yashodhara; Podszywalow-Bartnicka, Paulina; Maifrede, Silvia; Di Marcantonio, Daniela; Bolton-Gillespie, Elisabeth; Cramer-Morales, Kimberly; Lee, Jaewong; Li, Min; Slupianek, Artur; Gritsyuk, Daniel; Cerny-Reiterer, Sabine; Seferynska, Ilona; Bullinger, Lars; Gorbunova, Vera; Piwocka, Katarzyna; Valent, Peter; Civin, Curt I.; Muschen, Markus; Dick, John E.; Wang, Jean C.Y.; Bhatia, Smita; Bhatia, Ravi; Eppert, Kolja; Minden, Mark D.; Sykes, Stephen M.

2017-01-01

Quiescent and proliferating leukemia cells accumulate highly lethal DNA double-strand breaks that are repaired by 2 major mechanisms: BRCA-dependent homologous recombination and DNA-dependent protein kinase–mediated (DNA-PK–mediated) nonhomologous end-joining, whereas DNA repair pathways mediated by poly(ADP)ribose polymerase 1 (PARP1) serve as backups. Here we have designed a personalized medicine approach called gene expression and mutation analysis (GEMA) to identify BRCA- and DNA-PK–deficient leukemias either directly, using reverse transcription-quantitative PCR, microarrays, and flow cytometry, or indirectly, by the presence of oncogenes such as BCR-ABL1. DNA-PK–deficient quiescent leukemia cells and BRCA/DNA-PK–deficient proliferating leukemia cells were sensitive to PARP1 inhibitors that were administered alone or in combination with current antileukemic drugs. In conclusion, GEMA-guided targeting of PARP1 resulted in dual cellular synthetic lethality in quiescent and proliferating immature leukemia cells, and is thus a potential approach to eradicate leukemia stem and progenitor cells that are responsible for initiation and manifestation of the disease. Further, an analysis of The Cancer Genome Atlas database indicated that this personalized medicine approach could also be applied to treat numerous solid tumors from individual patients. PMID:28481221
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE PAGES

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...

2016-11-29

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
A comprehensive collection of systems biology data characterizing the host response to viral infection.

PubMed

Aevermann, Brian D; Pickett, Brett E; Kumar, Sanjeev; Klem, Edward B; Agnihothram, Sudhakar; Askovich, Peter S; Bankhead, Armand; Bolles, Meagen; Carter, Victoria; Chang, Jean; Clauss, Therese R W; Dash, Pradyot; Diercks, Alan H; Eisfeld, Amie J; Ellis, Amy; Fan, Shufang; Ferris, Martin T; Gralinski, Lisa E; Green, Richard R; Gritsenko, Marina A; Hatta, Masato; Heegel, Robert A; Jacobs, Jon M; Jeng, Sophia; Josset, Laurence; Kaiser, Shari M; Kelly, Sara; Law, G Lynn; Li, Chengjun; Li, Jiangning; Long, Casey; Luna, Maria L; Matzke, Melissa; McDermott, Jason; Menachery, Vineet; Metz, Thomas O; Mitchell, Hugh; Monroe, Matthew E; Navarro, Garnet; Neumann, Gabriele; Podyminogin, Rebecca L; Purvine, Samuel O; Rosenberger, Carrie M; Sanders, Catherine J; Schepmoes, Athena A; Shukla, Anil K; Sims, Amy; Sova, Pavel; Tam, Vincent C; Tchitchek, Nicolas; Thomas, Paul G; Tilton, Susan C; Totura, Allison; Wang, Jing; Webb-Robertson, Bobbie-Jo; Wen, Ji; Weiss, Jeffrey M; Yang, Feng; Yount, Boyd; Zhang, Qibin; McWeeney, Shannon; Smith, Richard D; Waters, Katrina M; Kawaoka, Yoshihiro; Baric, Ralph; Aderem, Alan; Katze, Michael G; Scheuermann, Richard H

2014-01-01

The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archived at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). By comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection.
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Computation and application of tissue-specific gene set weights.

PubMed

Frost, H Robert

2018-04-06

Gene set testing, or pathway analysis, has become a critical tool for the analysis of highdimensional genomic data. Although the function and activity of many genes and higher-level processes is tissue-specific, gene set testing is typically performed in a tissue agnostic fashion, which impacts statistical power and the interpretation and replication of results. To address this challenge, we have developed a bioinformatics approach to compute tissuespecific weights for individual gene sets using information on tissue-specific gene activity from the Human Protein Atlas (HPA). We used this approach to create a public repository of tissue-specific gene set weights for 37 different human tissue types from the HPA and all collections in the Molecular Signatures Database (MSigDB). To demonstrate the validity and utility of these weights, we explored three different applications: the functional characterization of human tissues, multi-tissue analysis for systemic diseases and tissue-specific gene set testing. All data used in the reported analyses is publicly available. An R implementation of the method and tissue-specific weights for MSigDB gene set collections can be downloaded at http://www.dartmouth.edu/∼hrfrost/TissueSpecificGeneSets. rob.frost@dartmouth.edu.

A comprehensive collection of systems biology data characterizing the host response to viral infection

PubMed Central

Aevermann, Brian D.; Pickett, Brett E.; Kumar, Sanjeev; Klem, Edward B.; Agnihothram, Sudhakar; Askovich, Peter S.; Bankhead, Armand; Bolles, Meagen; Carter, Victoria; Chang, Jean; Clauss, Therese R.W.; Dash, Pradyot; Diercks, Alan H.; Eisfeld, Amie J.; Ellis, Amy; Fan, Shufang; Ferris, Martin T.; Gralinski, Lisa E.; Green, Richard R.; Gritsenko, Marina A.; Hatta, Masato; Heegel, Robert A.; Jacobs, Jon M.; Jeng, Sophia; Josset, Laurence; Kaiser, Shari M.; Kelly, Sara; Law, G. Lynn; Li, Chengjun; Li, Jiangning; Long, Casey; Luna, Maria L.; Matzke, Melissa; McDermott, Jason; Menachery, Vineet; Metz, Thomas O.; Mitchell, Hugh; Monroe, Matthew E.; Navarro, Garnet; Neumann, Gabriele; Podyminogin, Rebecca L.; Purvine, Samuel O.; Rosenberger, Carrie M.; Sanders, Catherine J.; Schepmoes, Athena A.; Shukla, Anil K.; Sims, Amy; Sova, Pavel; Tam, Vincent C.; Tchitchek, Nicolas; Thomas, Paul G.; Tilton, Susan C.; Totura, Allison; Wang, Jing; Webb-Robertson, Bobbie-Jo; Wen, Ji; Weiss, Jeffrey M.; Yang, Feng; Yount, Boyd; Zhang, Qibin; McWeeney, Shannon; Smith, Richard D.; Waters, Katrina M.; Kawaoka, Yoshihiro; Baric, Ralph; Aderem, Alan; Katze, Michael G.; Scheuermann, Richard H.

2014-01-01

The Systems Biology for Infectious Diseases Research program was established by the U.S. National Institute of Allergy and Infectious Diseases to investigate host-pathogen interactions at a systems level. This program generated 47 transcriptomic and proteomic datasets from 30 studies that investigate in vivo and in vitro host responses to viral infections. Human pathogens in the Orthomyxoviridae and Coronaviridae families, especially pandemic H1N1 and avian H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus (SARS-CoV), were investigated. Study validation was demonstrated via experimental quality control measures and meta-analysis of independent experiments performed under similar conditions. Primary assay results are archived at the GEO and PeptideAtlas public repositories, while processed statistical results together with standardized metadata are publically available at the Influenza Research Database (www.fludb.org) and the Virus Pathogen Resource (www.viprbrc.org). By comparing data from mutant versus wild-type virus and host strains, RNA versus protein differential expression, and infection with genetically similar strains, these data can be used to further investigate genetic and physiological determinants of host responses to viral infection. PMID:25977790
hPDI: a database of experimental human protein-DNA interactions.

PubMed

Xie, Zhi; Hu, Shaohui; Blackshaw, Seth; Zhu, Heng; Qian, Jiang

2010-01-15

The human protein DNA Interactome (hPDI) database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The unique characteristics of hPDI are that it contains consensus DNA-binding sequences not only for nearly 500 human transcription factors but also for >500 unconventional DNA-binding proteins, which are completely uncharacterized previously. Users can browse, search and download a subset or the entire data via a web interface. This database is freely accessible for any academic purposes. http://bioinfo.wilmer.jhu.edu/PDI/.
MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

PubMed

Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

2002-01-01

Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.
Using SQL Databases for Sequence Similarity Searching and Analysis.

PubMed

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

PubMed Central

Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

2007-01-01

SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

PubMed

Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

2018-05-08

Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

PubMed

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-03-01

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
LocSigDB: a database of protein localization signals

PubMed Central

Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu

2015-01-01

LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059
Listeriomics: an Interactive Web Platform for Systems Biology of Listeria

PubMed Central

Koutero, Mikael; Tchitchek, Nicolas; Cerutti, Franck; Lechat, Pierre; Maillet, Nicolas; Hoede, Claire; Chiapello, Hélène; Gaspin, Christine

2017-01-01

ABSTRACT As for many model organisms, the amount of Listeria omics data produced has recently increased exponentially. There are now >80 published complete Listeria genomes, around 350 different transcriptomic data sets, and 25 proteomic data sets available. The analysis of these data sets through a systems biology approach and the generation of tools for biologists to browse these various data are a challenge for bioinformaticians. We have developed a web-based platform, named Listeriomics, that integrates different tools for omics data analyses, i.e., (i) an interactive genome viewer to display gene expression arrays, tiling arrays, and sequencing data sets along with proteomics and genomics data sets; (ii) an expression and protein atlas that connects every gene, small RNA, antisense RNA, or protein with the most relevant omics data; (iii) a specific tool for exploring protein conservation through the Listeria phylogenomic tree; and (iv) a coexpression network tool for the discovery of potential new regulations. Our platform integrates all the complete Listeria species genomes, transcriptomes, and proteomes published to date. This website allows navigation among all these data sets with enriched metadata in a user-friendly format and can be used as a central database for systems biology analysis. IMPORTANCE In the last decades, Listeria has become a key model organism for the study of host-pathogen interactions, noncoding RNA regulation, and bacterial adaptation to stress. To study these mechanisms, several genomics, transcriptomics, and proteomics data sets have been produced. We have developed Listeriomics, an interactive web platform to browse and correlate these heterogeneous sources of information. Our website will allow listeriologists and microbiologists to decipher key regulation mechanism by using a systems biology approach. PMID:28317029
Rice proteome analysis: a step toward functional analysis of the rice genome.

PubMed

Komatsu, Setsuko; Tanaka, Naoki

2005-03-01

The technique of proteome analysis using 2-DE has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this review, we describe construction of the rice proteome database, the cataloging of rice proteins, and the functional characterization of some of the proteins identified. Initially, proteins extracted from various tissues and organelles were separated by 2-DE and an image analyzer was used to construct a display or reference map of the proteins. The rice proteome database currently contains 23 reference maps based on 2-DE of proteins from different rice tissues and subcellular compartments. These reference maps comprise 13 129 rice proteins, and the amino acid sequences of 5092 of these proteins are entered in the database. Major proteins involved in growth or stress responses have been identified by using a proteomics approach and some of these proteins have unique functions. Furthermore, initial work has also begun on analyzing the phosphoproteome and protein-protein interactions in rice. The information obtained from the rice proteome database will aid in the molecular cloning of rice genes and in predicting the function of unknown proteins.
Databases and Associated Tools for Glycomics and Glycoproteomics.

PubMed

Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

2017-01-01

The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).
The developmental transcriptome atlas of the spoon worm Urechis unicinctus (Echiurida: Annelida).

PubMed

Park, Chungoo; Han, Yong-Hee; Lee, Sung-Gwon; Ry, Kyoung-Bin; Oh, Jooseong; Kern, Elizabeth M A; Park, Joong-Ki; Cho, Sung-Jin

2018-03-01

Echiurida is one of the most intriguing major subgroups of annelida because, unlike most other annelids, echiurids lack metameric body segmentation as adults. For this reason, transcriptome analyses from various developmental stages of echiurid species can be of substantial value for understanding precise expression levels and the complex regulatory networks during early and larval development. A total of 914 million raw RNA-Seq reads were produced from 14 developmental stages of Urechis unicinctus and were de novo assembled into contigs spanning 63,928,225 bp with an N50 length of 2700 bp. The resulting comprehensive transcriptome database of the early developmental stages of U. unicinctus consists of 20,305 representative functional protein-coding transcripts. Approximately 66% of unigenes were assigned to superphylum-level taxa, including Lophotrochozoa (40%). The completeness of the transcriptome assembly was assessed using benchmarking universal single-copy orthologs; 75.7% of the single-copy orthologs were presented in our transcriptome database. We observed 3 distinct patterns of global transcriptome profiles from 14 developmental stages and identified 12,705 genes that showed dynamic regulation patterns during the differentiation and maturation of U. unicinctus cells. We present the first large-scale developmental transcriptome dataset of U. unicinctus and provide a general overview of the dynamics of global gene expression changes during its early developmental stages. The analysis of time-course gene expression data is a first step toward understanding the complex developmental gene regulatory networks in U. unicinctus and will furnish a valuable resource for analyzing the functions of gene repertoires in various developmental phases.
A Sundial-Atlas Precursor to the TIMED Mission: A Quick-Response Global Investigation into Coupled Lower Thermospheric, Ionospheric, and Mesospheric Physics

NASA Technical Reports Server (NTRS)

Szuszczewicz, E. P.

1996-01-01

The SUNDIAL-ATLAS effort was a global-scale investigation which responded to the science priorities of the ITM Panel, the Integrated SPD Strategy Implementation Plan as a whole, and the need for potential cost-saving design criteria for the TIMED mission. The investigation focused on coupling processes in the ionospheric-thermospheric system, taking advantage of the timelines of the ATLAS-1 mission (March 1992), and the global-scale ground-based measurement and modeling activities of the SUNDIAL program. The collaborative SUNDIAL-ATLAS activity was the first opportunity for global measurements of the chemistry, kinetics, and electrodynamics which couple the E-, Fl-, and F2-regions into a single interactive system. As such, the program represented an important first step in studying global issues; and accordingly, was an important proof of concept experiment relevant to the strategic mission plans for the ITM community and the upcoming intermediate class satellite program called TIMED. To meet its projected goals, TIMED must perform a number of critical measurements and execute a number of correlations that were to be tried and tested for the first time in the SUNDIAL-ATLAS investigation. This was designed to include global correlations of thermospheric and ionospheric composition during quiet and disturbed conditions and the co-registration of global-scale ground-based measurements with along-track satellite diagnostics. The SUNDIAL component of the current investigation addressed this need by acquiring, reducing, and analyzing a multi-sensor database that complemented and extended that which was generated in the ATLAS mission (Atmospheric Laboratory for Applications and Science). The SUNDIAL data defined the state and condition of the global-scale ionosphere in the altitude range from 100 km to the F2-peak. These data specified the peak heights and densities of the E-, Fl-, and F2-regions, along with the global distributions of intermediate, descending, and sequential layers which play a critical role in the dynamo region of the lower ionospheric-thermospheric domain. The data were collected by the SUNDIAL network of more than 50 ground-based stations utilizing ionosondes, radars, photometers, Fabry-Perot interferometers, and total electron content measurements. The data were acquired during a three-week period centered on the eight-day ATLAS-1 mission, which provided image and photometric sensing of the altitude distributions of the major and minor ions and neutrals in the ITM system. This report focuses on the scientific contributions of the SUNDIAL component of the overall investigation. Specific findings are described in seven papers (attached) published in the Journal of Geophysical Research.
GALT protein database: querying structural and functional features of GALT enzyme.

PubMed

d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

2014-09-01

Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.
Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server.

PubMed

Vohradsky, Jiri; Janda, Ivan; Grünenfelder, Björn; Berndt, Peter; Röder, Daniel; Langen, Hanno; Weiser, Jaroslav; Jenal, Urs

2003-10-01

Here we present the Swiss-Czech Proteomics Server (SWICZ), which hosts the proteomic database summarizing information about the cell cycle of the aquatic bacterium Caulobacter crescentus. The database provides a searchable tool for easy access of global protein synthesis and protein stability data as examined during the C. crescentus cell cycle. Protein synthesis data collected from five different cell cycle stages were determined for each protein spot as a relative value of the total amount of [(35)S]methionine incorporation. Protein stability of pulse-labeled extracts were measured during a chase period equivalent to one cell cycle unit. Quantitative information for individual proteins together with descriptive data such as protein identities, apparent molecular masses and isoelectric points, were combined with information on protein function, genomic context, and the cell cycle stage, and were then assembled in a relational database with a world wide web interface (http://proteom.biomed.cas.cz), which allows the database records to be searched and displays the recovered information. A total of 1250 protein spots were reproducibly detected on two-dimensional gel electropherograms, 295 of which were identified by mass spectroscopy. The database is accessible either through clickable two-dimensional gel electrophoretic maps or by means of a set of dedicated search engines. Basic characterization of the experimental procedures, data processing, and a comprehensive description of the web site are presented. In its current state, the SWICZ proteome database provides a platform for the incorporation of new data emerging from extended functional studies on the C. crescentus proteome.
Custom ultrasonic instrumentation for flow measurement and real-time binary gas analysis in the CERN ATLAS experiment

NASA Astrophysics Data System (ADS)

Alhroob, M.; Battistin, M.; Berry, S.; Bitadze, A.; Bonneau, P.; Boyd, G.; Crespo-Lopez, O.; Degeorge, C.; Deterre, C.; Di Girolamo, B.; Doubek, M.; Favre, G.; Hallewell, G.; Katunin, S.; Lombard, D.; Madsen, A.; McMahon, S.; Nagai, K.; O'Rourke, A.; Pearson, B.; Robinson, D.; Rossi, C.; Rozanov, A.; Stanecka, E.; Strauss, M.; Vacek, V.; Vaglio, R.; Young, J.; Zwalinski, L.

2017-01-01

The development of custom ultrasonic instrumentation was motivated by the need for continuous real-time monitoring of possible leaks and mass flow measurement in the evaporative cooling systems of the ATLAS silicon trackers. The instruments use pairs of ultrasonic transducers transmitting sound bursts and measuring transit times in opposite directions. The gas flow rate is calculated from the difference in transit times, while the sound velocity is deduced from their average. The gas composition is then evaluated by comparison with a molar composition vs. sound velocity database, based on the direct dependence between sound velocity and component molar concentration in a gas mixture at a known temperature and pressure. The instrumentation has been developed in several geometries, with five instruments now integrated and in continuous operation within the ATLAS Detector Control System (DCS) and its finite state machine. One instrument monitors C3F8 coolant leaks into the Pixel detector N2 envelope with a molar resolution better than 2ṡ 10-5, and has indicated a level of 0.14 % when all the cooling loops of the recently re-installed Pixel detector are operational. Another instrument monitors air ingress into the C3F8 condenser of the new C3F8 thermosiphon coolant recirculator, with sub-percent precision. The recent effect of the introduction of a small quantity of N2 volume into the 9.5 m3 total volume of the thermosiphon system was clearly seen with this instrument. Custom microcontroller-based readout has been developed for the instruments, allowing readout into the ATLAS DCS via Modbus TCP/IP on Ethernet. The instrumentation has many potential applications where continuous binary gas composition is required, including in hydrocarbon and anaesthetic gas mixtures.
KnotProt: a database of proteins with knots and slipknots

PubMed Central

Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J.; Stasiak, Andrzej; Millett, Kenneth C.; Sułkowski, Piotr; Sulkowska, Joanna I.

2015-01-01

The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. PMID:25361973
Integrated Controlling System and Unified Database for High Throughput Protein Crystallography Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaponov, Yu.A.; Igarashi, N.; Hiraki, M.

2004-05-12

An integrated controlling system and a unified database for high throughput protein crystallography experiments have been developed. Main features of protein crystallography experiments (purification, crystallization, crystal harvesting, data collection, data processing) were integrated into the software under development. All information necessary to perform protein crystallography experiments is stored (except raw X-ray data that are stored in a central data server) in a MySQL relational database. The database contains four mutually linked hierarchical trees describing protein crystals, data collection of protein crystal and experimental data processing. A database editor was designed and developed. The editor supports basic database functions to view,more » create, modify and delete user records in the database. Two search engines were realized: direct search of necessary information in the database and object oriented search. The system is based on TCP/IP secure UNIX sockets with four predefined sending and receiving behaviors, which support communications between all connected servers and clients with remote control functions (creating and modifying data for experimental conditions, data acquisition, viewing experimental data, and performing data processing). Two secure login schemes were designed and developed: a direct method (using the developed Linux clients with secure connection) and an indirect method (using the secure SSL connection using secure X11 support from any operating system with X-terminal and SSH support). A part of the system has been implemented on a new MAD beam line, NW12, at the Photon Factory Advanced Ring for general user experiments.« less

Introducing the CPL/MUW proteome database: interpretation of human liver and liver cancer proteome profiles by referring to isolated primary cells.

PubMed

Wimmer, Helge; Gundacker, Nina C; Griss, Johannes; Haudek, Verena J; Stättner, Stefan; Mohr, Thomas; Zwickl, Hannes; Paulitschke, Verena; Baron, David M; Trittner, Wolfgang; Kubicek, Markus; Bayer, Editha; Slany, Astrid; Gerner, Christopher

2009-06-01

Interpretation of proteome data with a focus on biomarker discovery largely relies on comparative proteome analyses. Here, we introduce a database-assisted interpretation strategy based on proteome profiles of primary cells. Both 2-D-PAGE and shotgun proteomics are applied. We obtain high data concordance with these two different techniques. When applying mass analysis of tryptic spot digests from 2-D gels of cytoplasmic fractions, we typically identify several hundred proteins. Using the same protein fractions, we usually identify more than thousand proteins by shotgun proteomics. The data consistency obtained when comparing these independent data sets exceeds 99% of the proteins identified in the 2-D gels. Many characteristic differences in protein expression of different cells can thus be independently confirmed. Our self-designed SQL database (CPL/MUW - database of the Clinical Proteomics Laboratories at the Medical University of Vienna accessible via www.meduniwien.ac.at/proteomics/database) facilitates (i) quality management of protein identification data, which are based on MS, (ii) the detection of cell type-specific proteins and (iii) of molecular signatures of specific functional cell states. Here, we demonstrate, how the interpretation of proteome profiles obtained from human liver tissue and hepatocellular carcinoma tissue is assisted by the Clinical Proteomics Laboratories at the Medical University of Vienna-database. Therefore, we suggest that the use of reference experiments supported by a tailored database may substantially facilitate data interpretation of proteome profiling experiments.
Extraction, integration and analysis of alternative splicing and protein structure distributed information

PubMed Central

D'Antonio, Matteo; Masseroli, Marco

2009-01-01

Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075
Catalog of infrared observations. Part 1: Data

NASA Technical Reports Server (NTRS)

Gezari, Daniel Y.; Schmitz, Marion; Mead, Jaylee M.

1987-01-01

The Catalog of Infrared Observations (CIO) is a compilation of infrared astronomical observational data obtained from an extensive literature search of astronomical journals and major astronomical catalogs and surveys. The literature searches are complete for 1965 through 1986 in this Second Edition. The Catalog is published in two parts, with the observational data (roughly 200,000 observations of 20,000 individual sources) listed in Part I, and supporting appendices in Part II. The expanded Second Edition contains a new feature: complete IRAS 4-band data for all CIO sources detected, listed with the main Catalog observations, as well as in complete detail in the Appendix. The appendices include an atlas of infrared source positions, two bibliographies of infrared literature upon which the search was based, and, keyed to the main Catalog listings (organized alphabetically by author and then chronologically), an atlas of infrared spectral ranges, and IRAS data from the CIO sources. The complete CIO database is available to qualified users in printed microfiche and magnetic tape formats.
FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.

PubMed

Abugessaisa, Imad; Noguchi, Shuhei; Hasegawa, Akira; Harshbarger, Jayson; Kondo, Atsushi; Lizio, Marina; Severin, Jessica; Carninci, Piero; Kawaji, Hideya; Kasukawa, Takeya

2017-08-29

The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.
A JEE RESTful service to access Conditions Data in ATLAS

NASA Astrophysics Data System (ADS)

Formica, Andrea; Gallas, E. J.

2015-12-01

Usage of condition data in ATLAS is extensive for offline reconstruction and analysis (e.g. alignment, calibration, data quality). The system is based on the LCG Conditions Database infrastructure, with read and write access via an ad hoc C++ API (COOL), a system which was developed before Run 1 data taking began. The infrastructure dictates that the data is organized into separate schemas (assigned to subsystems/groups storing distinct and independent sets of conditions), making it difficult to access information from several schemas at the same time. We have thus created PL/SQL functions containing queries to provide content extraction at multi-schema level. The PL/SQL API has been exposed to external clients by means of a Java application providing DB access via REST services, deployed inside an application server (JBoss WildFly). The services allow navigation over multiple schemas via simple URLs. The data can be retrieved either in XML or JSON formats, via simple clients (like curl or Web browsers).
High-resolution digital brain atlases: a Hubble telescope for the brain.

PubMed

Jones, Edward G; Stone, James M; Karten, Harvey J

2011-05-01

We describe implementation of a method for digitizing at microscopic resolution brain tissue sections containing normal and experimental data and for making the content readily accessible online. Web-accessible brain atlases and virtual microscopes for online examination can be developed using existing computer and internet technologies. Resulting databases, made up of hierarchically organized, multiresolution images, enable rapid, seamless navigation through the vast image datasets generated by high-resolution scanning. Tools for visualization and annotation of virtual microscope slides enable remote and universal data sharing. Interactive visualization of a complete series of brain sections digitized at subneuronal levels of resolution offers fine grain and large-scale localization and quantification of many aspects of neural organization and structure. The method is straightforward and replicable; it can increase accessibility and facilitate sharing of neuroanatomical data. It provides an opportunity for capturing and preserving irreplaceable, archival neurohistological collections and making them available to all scientists in perpetuity, if resources could be obtained from hitherto uninterested agencies of scientific support. © 2011 New York Academy of Sciences.
Monitoring of computing resource utilization of the ATLAS experiment

NASA Astrophysics Data System (ADS)

Rousseau, David; Dimitrov, Gancho; Vukotic, Ilija; Aidel, Osman; Schaffer, Rd; Albrand, Solveig

2012-12-01

Due to the good performance of the LHC accelerator, the ATLAS experiment has seen higher than anticipated levels for both the event rate and the average number of interactions per bunch crossing. In order to respond to these changing requirements, the current and future usage of CPU, memory and disk resources has to be monitored, understood and acted upon. This requires data collection at a fairly fine level of granularity: the performance of each object written and each algorithm run, as well as a dozen per-job variables, are gathered for the different processing steps of Monte Carlo generation and simulation and the reconstruction of both data and Monte Carlo. We present a system to collect and visualize the data from both the online Tier-0 system and distributed grid production jobs. Around 40 GB of performance data are expected from up to 200k jobs per day, thus making performance optimization of the underlying Oracle database of utmost importance.
An automatic multi-atlas prostate segmentation in MRI using a multiscale representation and a label fusion strategy

NASA Astrophysics Data System (ADS)

Álvarez, Charlens; Martínez, Fabio; Romero, Eduardo

2015-01-01

The pelvic magnetic Resonance images (MRI) are used in Prostate cancer radiotherapy (RT), a process which is part of the radiation planning. Modern protocols require a manual delineation, a tedious and variable activity that may take about 20 minutes per patient, even for trained experts. That considerable time is an important work ow burden in most radiological services. Automatic or semi-automatic methods might improve the efficiency by decreasing the measure times while conserving the required accuracy. This work presents a fully automatic atlas- based segmentation strategy that selects the more similar templates for a new MRI using a robust multi-scale SURF analysis. Then a new segmentation is achieved by a linear combination of the selected templates, which are previously non-rigidly registered towards the new image. The proposed method shows reliable segmentations, obtaining an average DICE Coefficient of 79%, when comparing with the expert manual segmentation, under a leave-one-out scheme with the training database.
How to identify rectal sub-regions likely involved in rectal bleeding in prostate cancer radiotherapy

NASA Astrophysics Data System (ADS)

Dréan, G.; Acosta, O.; Ospina, J. D.; Voisin, C.; Rigaud, B.; Simon, A.; Haigron, P.; de Crevoisier, R.

2013-11-01

Nowadays, the de nition of patient-speci c constraints in prostate cancer radiotherapy planning are solely based on dose-volume histogram (DVH) parameters. Nevertheless those DVH models lack of spatial accuracy since they do not use the complete 3D information of the dose distribution. The goal of the study was to propose an automatic work ow to de ne patient-speci c rectal sub-regions (RSR) involved in rectal bleeding (RB) in case of prostate cancer radiotherapy. A multi-atlas database spanning the large rectal shape variability was built from a population of 116 individuals. Non-rigid registration followed by voxel-wise statistical analysis on those templates allowed nding RSR likely correlated with RB (from a learning cohort of 63 patients). To de ne patient-speci c RSR, weighted atlas-based segmentation with a vote was then applied to 30 test patients. Results show the potentiality of the method to be used for patient-speci c planning of intensity modulated radiotherapy (IMRT).
Proteomics data exchange and storage: the need for common standards and public repositories.

PubMed

Jiménez, Rafael C; Vizcaíno, Juan Antonio

2013-01-01

Both the existence of data standards and public databases or repositories have been key factors behind the development of the existing "omics" approaches. In this book chapter we first review the main existing mass spectrometry (MS)-based proteomics resources: PRIDE, PeptideAtlas, GPMDB, and Tranche. Second, we report on the current status of the different proteomics data standards developed by the Proteomics Standards Initiative (PSI): the formats mzML, mzIdentML, mzQuantML, TraML, and PSI-MI XML are then reviewed. Finally, we present an easy way to query and access MS proteomics data in the PRIDE database, as a representative of the existing repositories, using the workflow management system (WMS) tool Taverna. Two different publicly available workflows are explained and described.
GRBase, a new gene regulation data base available by anonymous ftp.

PubMed Central

Collier, B; Danielsen, M

1994-01-01

The Gene Regulation Database (GRBase) is a compendium of information on the structure and function of proteins involved in the control of gene expression in eukaryotes. These proteins include transcription factors, proteins involved in signal transduction, and receptors. The database can be obtained by FTP in Filemaker Pro, text, and postscript formats. The database will be expanded in the coming year to include reviews on families of proteins involved in gene regulation and to allow online searching. PMID:7937071
Columba: an integrated database of proteins, structures, and annotations.

PubMed

Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

2005-03-31

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
Creating a histology-embryology free digital image database using high-end microscopy and computer techniques for on-line biomedical education.

PubMed

Silva-Lopes, Victor W; Monteiro-Leal, Luiz H

2003-07-01

The development of new technology and the possibility of fast information delivery by either Internet or Intranet connections are changing education. Microanatomy education depends basically on the correct interpretation of microscopy images by students. Modern microscopes coupled to computers enable the presentation of these images in a digital form by creating image databases. However, the access to this new technology is restricted entirely to those living in cities and towns with an Information Technology (IT) infrastructure. This study describes the creation of a free Internet histology database composed by high-quality images and also presents an inexpensive way to supply it to a greater number of students through Internet/Intranet connections. By using state-of-the-art scientific instruments, we developed a Web page (http://www2.uerj.br/~micron/atlas/atlasenglish/index.htm) that, in association with a multimedia microscopy laboratory, intends to help in the reduction of the IT educational gap between developed and underdeveloped regions. Copyright 2003 Wiley-Liss, Inc.
Compiling a national resistivity atlas of Denmark based on airborne and ground-based transient electromagnetic data

NASA Astrophysics Data System (ADS)

Barfod, Adrian A. S.; Møller, Ingelise; Christiansen, Anders V.

2016-11-01

We present a large-scale study of the petrophysical relationship of resistivities obtained from densely sampled ground-based and airborne transient electromagnetic surveys and lithological information from boreholes. The overriding aim of this study is to develop a framework for examining the resistivity-lithology relationship in a statistical manner and apply this framework to gain a better description of the large-scale resistivity structures of the subsurface. In Denmark very large and extensive datasets are available through the national geophysical and borehole databases, GERDA and JUPITER respectively. In a 10 by 10 km grid, these data are compiled into histograms of resistivity versus lithology. To do this, the geophysical data are interpolated to the position of the boreholes, which allows for a lithological categorization of the interpolated resistivity values, yielding different histograms for a set of desired lithological categories. By applying the proposed algorithm to all available boreholes and airborne and ground-based transient electromagnetic data we build nation-wide maps of the resistivity-lithology relationships in Denmark. The presented Resistivity Atlas reveals varying patterns in the large-scale resistivity-lithology relations, reflecting geological details such as available source material for tills. The resistivity maps also reveal a clear ambiguity in the resistivity values for different lithologies. The Resistivity Atlas is highly useful when geophysical data are to be used for geological or hydrological modeling.
Stellar Atmospheric Modelling for the ACCESS Program

NASA Astrophysics Data System (ADS)

Morris, Matthew; Kaiser, Mary Elizabeth; Bohlin, Ralph; Kurucz, Robert; ACCESS Team

2018-01-01

A goal of the ACCESS program (Absolute Color Calibration Experiment for Standard Stars) is to enable greater discrimination between theoretical astrophysical models and observations, where the comparison is limited by systematic errors associated with the relative flux calibration of the targets. To achieve these goals, ACCESS has been designed as a sub-orbital rocket borne payload and ground calibration program, to establish absolute flux calibration of stellar targets at <1 % precision, with a resolving power of 500 across the 0.35 to 1.7 micron bandpass.In order to obtain higher resolution spectroscopy in the optical and near-infrared range than either the ACCESS payload or CALSPEC observations provide, the ACCESS team has conducted a multi-instrument observing program at Apache Point Observatory. Using these calibrated high resolution spectra in addition to the HST/CALSPEC data, we have generated stellar atmosphere models for ACCESS flight candidates, as well as a selection of A and G stars from the CALSPEC database. Stellar atmosphere models were generated using Atlas 9 and Atlas 12 Kurucz stellar atmosphere software. The effective temperature, log(g), metallicity, and redenning were varied and the chi-squared statistic was minimized to obtain a best-fit model. A comparison of these models and the results from interpolation between grids of existing models will be presented. The impact of the flexibility of the Atlas 12 input parameters (e.g. solar metallicity fraction, abundances, microturbulent velocity) is being explored.
Segmentation of MR images via discriminative dictionary learning and sparse coding: application to hippocampus labeling.

PubMed

Tong, Tong; Wolz, Robin; Coupé, Pierrick; Hajnal, Joseph V; Rueckert, Daniel

2013-08-01

We propose a novel method for the automatic segmentation of brain MRI images by using discriminative dictionary learning and sparse coding techniques. In the proposed method, dictionaries and classifiers are learned simultaneously from a set of brain atlases, which can then be used for the reconstruction and segmentation of an unseen target image. The proposed segmentation strategy is based on image reconstruction, which is in contrast to most existing atlas-based labeling approaches that rely on comparing image similarities between atlases and target images. In addition, we propose a Fixed Discriminative Dictionary Learning for Segmentation (F-DDLS) strategy, which can learn dictionaries offline and perform segmentations online, enabling a significant speed-up in the segmentation stage. The proposed method has been evaluated for the hippocampus segmentation of 80 healthy ICBM subjects and 202 ADNI images. The robustness of the proposed method, especially of our F-DDLS strategy, was validated by training and testing on different subject groups in the ADNI database. The influence of different parameters was studied and the performance of the proposed method was also compared with that of the nonlocal patch-based approach. The proposed method achieved a median Dice coefficient of 0.879 on 202 ADNI images and 0.890 on 80 ICBM subjects, which is competitive compared with state-of-the-art methods. Copyright © 2013 Elsevier Inc. All rights reserved.
Automatic segmentation of brain MRIs and mapping neuroanatomy across the human lifespan

NASA Astrophysics Data System (ADS)

Keihaninejad, Shiva; Heckemann, Rolf A.; Gousias, Ioannis S.; Rueckert, Daniel; Aljabar, Paul; Hajnal, Joseph V.; Hammers, Alexander

2009-02-01

A robust model for the automatic segmentation of human brain images into anatomically defined regions across the human lifespan would be highly desirable, but such structural segmentations of brain MRI are challenging due to age-related changes. We have developed a new method, based on established algorithms for automatic segmentation of young adults' brains. We used prior information from 30 anatomical atlases, which had been manually segmented into 83 anatomical structures. Target MRIs came from 80 subjects (~12 individuals/decade) from 20 to 90 years, with equal numbers of men, women; data from two different scanners (1.5T, 3T), using the IXI database. Each of the adult atlases was registered to each target MR image. By using additional information from segmentation into tissue classes (GM, WM and CSF) to initialise the warping based on label consistency similarity before feeding this into the previous normalised mutual information non-rigid registration, the registration became robust enough to accommodate atrophy and ventricular enlargement with age. The final segmentation was obtained by combination of the 30 propagated atlases using decision fusion. Kernel smoothing was used for modelling the structural volume changes with aging. Example linear correlation coefficients with age were, for lateral ventricular volume, rmale=0.76, rfemale=0.58 and, for hippocampal volume, rmale=-0.6, rfemale=-0.4 (allρ<0.01).
sc-PDB: a 3D-database of ligandable binding sites—10 years on

PubMed Central

Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

2015-01-01

The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483
MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore.

PubMed

Ren, Jian; Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

2010-01-01

During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0).
MiCroKit 3.0: an integrated database of midbody, centrosome and kinetochore

PubMed Central

Liu, Zexian; Gao, Xinjiao; Jin, Changjiang; Ye, Mingliang; Zou, Hanfa; Wen, Longping; Zhang, Zhaolei; Xue, Yu; Yao, Xuebiao

2010-01-01

During cell division/mitosis, a specific subset of proteins is spatially and temporally assembled into protein super complexes in three distinct regions, i.e. centrosome/spindle pole, kinetochore/centromere and midbody/cleavage furrow/phragmoplast/bud neck, and modulates cell division process faithfully. Although many experimental efforts have been carried out to investigate the characteristics of these proteins, no integrated database was available. Here, we present the MiCroKit database (http://microkit.biocuckoo.org) of proteins that localize in midbody, centrosome and/or kinetochore. We collected into the MiCroKit database experimentally verified microkit proteins from the scientific literature that have unambiguous supportive evidence for subcellular localization under fluorescent microscope. The current version of MiCroKit 3.0 provides detailed information for 1489 microkit proteins from seven model organisms, including Saccharomyces cerevisiae, Schizasaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster, Xenopus laevis, Mus musculus and Homo sapiens. Moreover, the orthologous information was provided for these microkit proteins, and could be a useful resource for further experimental identification. The online service of MiCroKit database was implemented in PHP + MySQL + JavaScript, while the local packages were developed in JAVA 1.5 (J2SE 5.0). PMID:19783819

SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics

PubMed Central

2013-01-01

Background Alternative splicing is an important and widespread mechanism for generating protein diversity and regulating protein expression. High-throughput identification and analysis of alternative splicing in the protein level has more advantages than in the mRNA level. The combination of alternative splicing database and tandem mass spectrometry provides a powerful technique for identification, analysis and characterization of potential novel alternative splicing protein isoforms from proteomics. Therefore, based on the peptidomic database of human protein isoforms for proteomics experiments, our objective is to design a new alternative splicing database to 1) provide more coverage of genes, transcripts and alternative splicing, 2) exclusively focus on the alternative splicing, and 3) perform context-specific alternative splicing analysis. Results We used a three-step pipeline to create a synthetic alternative splicing database (SASD) to identify novel alternative splicing isoforms and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. First, we extracted information on gene structures of all genes in the Ensembl Genes 71 database and incorporated the Integrated Pathway Analysis Database. Then, we compiled artificial splicing transcripts. Lastly, we translated the artificial transcripts into alternative splicing peptides. The SASD is a comprehensive database containing 56,630 genes (Ensembl gene IDs), 95,260 transcripts (Ensembl transcript IDs), and 11,919,779 Alternative Splicing peptides, and also covering about 1,956 pathways, 6,704 diseases, 5,615 drugs, and 52 organs. The database has a web-based user interface that allows users to search, display and download a single gene/transcript/protein, custom gene set, pathway, disease, drug, organ related alternative splicing. Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) in liver cancer and 2) in breast cancer. Conclusions The SASD provides the scientific community with an efficient means to identify, analyze, and characterize novel Exon Skipping and Intron Retention protein isoforms from mass spectrometry and interpret them at the context of pathway, disease, drug and organ specificity or custom gene set with maximum coverage and exclusive focus on alternative splicing. PMID:24267658
HPASubC: A suite of tools for user subclassification of human protein atlas tissue images.

PubMed

Cornish, Toby C; Chakravarti, Aravinda; Kapoor, Ashish; Halushka, Marc K

2015-01-01

The human protein atlas (HPA) is a powerful proteomic tool for visualizing the distribution of protein expression across most human tissues and many common malignancies. The HPA includes immunohistochemically-stained images from tissue microarrays (TMAs) that cover 48 tissue types and 20 common malignancies. The TMA data are used to provide expression information at the tissue, cellular, and occasionally, subcellular level. The HPA also provides subcellular data from confocal immunofluorescence data on three cell lines. Despite the availability of localization data, many unique patterns of cellular and subcellular expression are not documented. To get at this more granular data, we have developed a suite of Python scripts, HPASubC, to aid in subcellular, and cell-type specific classification of HPA images. This method allows the user to download and optimize specific HPA TMA images for review. Then, using a playstation-style video game controller, a trained observer can rapidly step through 10's of 1000's of images to identify patterns of interest. We have successfully used this method to identify 703 endothelial cell (EC) and/or smooth muscle cell (SMCs) specific proteins discovered within 49,200 heart TMA images. This list will assist us in subdividing cardiac gene or protein array data into expression by one of the predominant cell types of the myocardium: Myocytes, SMCs or ECs. The opportunity to further characterize unique staining patterns across a range of human tissues and malignancies will accelerate our understanding of disease processes and point to novel markers for tissue evaluation in surgical pathology.
HPASubC: A suite of tools for user subclassification of human protein atlas tissue images

PubMed Central

Cornish, Toby C.; Chakravarti, Aravinda; Kapoor, Ashish; Halushka, Marc K.

2015-01-01

Background: The human protein atlas (HPA) is a powerful proteomic tool for visualizing the distribution of protein expression across most human tissues and many common malignancies. The HPA includes immunohistochemically-stained images from tissue microarrays (TMAs) that cover 48 tissue types and 20 common malignancies. The TMA data are used to provide expression information at the tissue, cellular, and occasionally, subcellular level. The HPA also provides subcellular data from confocal immunofluorescence data on three cell lines. Despite the availability of localization data, many unique patterns of cellular and subcellular expression are not documented. Materials and Methods: To get at this more granular data, we have developed a suite of Python scripts, HPASubC, to aid in subcellular, and cell-type specific classification of HPA images. This method allows the user to download and optimize specific HPA TMA images for review. Then, using a playstation-style video game controller, a trained observer can rapidly step through 10's of 1000's of images to identify patterns of interest. Results: We have successfully used this method to identify 703 endothelial cell (EC) and/or smooth muscle cell (SMCs) specific proteins discovered within 49,200 heart TMA images. This list will assist us in subdividing cardiac gene or protein array data into expression by one of the predominant cell types of the myocardium: Myocytes, SMCs or ECs. Conclusions: The opportunity to further characterize unique staining patterns across a range of human tissues and malignancies will accelerate our understanding of disease processes and point to novel markers for tissue evaluation in surgical pathology. PMID:26167380
Why are they missing? : Bioinformatics characterization of missing human proteins.

PubMed

Elguoshy, Amr; Magdeldin, Sameh; Xu, Bo; Hirao, Yoshitoshi; Zhang, Ying; Kinoshita, Naohiko; Takisawa, Yusuke; Nameta, Masaaki; Yamamoto, Keiko; El-Refy, Ali; El-Fiky, Fawzy; Yamamoto, Tadashi

2016-10-21

NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification. Copyright © 2016. Published by Elsevier B.V.
CPTAC Releases Largest-Ever Breast Cancer Proteome Dataset from Previously Genome Characterized Tumors | Office of Cancer Clinical Proteomics Research

Cancer.gov

National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) scientists have released a dataset of proteins and phosphopeptides identified through deep proteomic and phosphoproteomic analysis of breast tumor samples, previously genomically analyzed by The Cancer Genome Atlas (TCGA).
STS-66 Space Shuttle mission report

NASA Technical Reports Server (NTRS)

Fricke, Robert W., Jr.

1995-01-01

The primary objective of this flight was to accomplish complementary science objectives by operating the Atmospheric Laboratory for Applications and Science-3 (ATLAS-3) and the Cryogenic Infrared Spectrometers and Telescopes for the Atmosphere-Shuttle Pallet Satellite (CRISTA-SPAS). The secondary objectives of this flight were to perform the operations of the Shuttle Solar Backscatter Ultraviolet/A (SSBUV/A) payload, the Experiment of the Sun Complementing the Atlas Payload and Education-II (ESCAPE-II) payload, the Physiological and Anatomical Rodent Experiment/National Institutes of Health Rodents (PARE/NIH-R) payload, the Protein Crystal Growth-Thermal Enclosure System (PCG-TES) payload, the Protein Crystal Growth-Single Locker Thermal Enclosure System (PCG-STES), the Space Tissue/National Institutes of Health Cells STL/N -A payload, the Space Acceleration Measurement Systems (SAMS) Experiment, and Heat Pipe Performance Experiment (HPPE) payload. The 11-day plus 2 contingency day STS-66 mission was flown as planned, with no contingency days used for weather avoidance or Orbiter contingency operations. Appendix A lists the sources of data from which this report was prepared, and Appendix B defines all acronyms and abbreviations used in the report.
STS-66 Space Shuttle mission report

NASA Astrophysics Data System (ADS)

Fricke, Robert W., Jr.

1995-02-01

The primary objective of this flight was to accomplish complementary science objectives by operating the Atmospheric Laboratory for Applications and Science-3 (ATLAS-3) and the Cryogenic Infrared Spectrometers and Telescopes for the Atmosphere-Shuttle Pallet Satellite (CRISTA-SPAS). The secondary objectives of this flight were to perform the operations of the Shuttle Solar Backscatter Ultraviolet/A (SSBUV/A) payload, the Experiment of the Sun Complementing the Atlas Payload and Education-II (ESCAPE-II) payload, the Physiological and Anatomical Rodent Experiment/National Institutes of Health Rodents (PARE/NIH-R) payload, the Protein Crystal Growth-Thermal Enclosure System (PCG-TES) payload, the Protein Crystal Growth-Single Locker Thermal Enclosure System (PCG-STES), the Space Tissue/National Institutes of Health Cells STL/N -A payload, the Space Acceleration Measurement Systems (SAMS) Experiment, and Heat Pipe Performance Experiment (HPPE) payload. The 11-day plus 2 contingency day STS-66 mission was flown as planned, with no contingency days used for weather avoidance or Orbiter contingency operations. Appendix A lists the sources of data from which this report was prepared, and Appendix B defines all acronyms and abbreviations used in the report.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

PubMed Central

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-01-01

Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
An atlas-based organ dose estimator for tomosynthesis and radiography

NASA Astrophysics Data System (ADS)

Hoye, Jocelyn; Zhang, Yakun; Agasthya, Greeshma; Sturgeon, Greg; Kapadia, Anuj; Segars, W. Paul; Samei, Ehsan

2017-03-01

The purpose of this study was to provide patient-specific organ dose estimation based on an atlas of human models for twenty tomosynthesis and radiography protocols. The study utilized a library of 54 adult computational phantoms (age: 18-78 years, weight 52-117 kg) and a validated Monte-Carlo simulation (PENELOPE) of a tomosynthesis and radiography system to estimate organ dose. Positioning of patient anatomy was based on radiographic positioning handbooks. The field of view for each exam was calculated to include relevant organs per protocol. Through simulations, the energy deposited in each organ was binned to estimate normalized organ doses into a reference database. The database can be used as the basis to devise a dose calculator to predict patient-specific organ dose values based on kVp, mAs, exposure in air, and patient habitus for a given protocol. As an example of the utility of this tool, dose to an organ was studied as a function of average patient thickness in the field of view for a given exam and as a function of Body Mass Index (BMI). For tomosynthesis, organ doses can also be studied as a function of x-ray tube position. This work developed comprehensive information for organ dose dependencies across tomosynthesis and radiography. There was a general exponential decrease dependency with increasing patient size that is highly protocol dependent. There was a wide range of variability in organ dose across the patient population, which needs to be incorporated in the metrology of organ dose.
Algorithms for database-dependent search of MS/MS data.

PubMed

Matthiesen, Rune

2013-01-01

The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

PubMed Central

Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

2016-01-01

Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237
The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

PubMed

Takashima, S

2001-04-05

The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.
Inconsistencies in the red blood cell membrane proteome analysis: generation of a database for research and diagnostic applications

PubMed Central

Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs

2015-01-01

Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
Database resources of the National Center for Biotechnology Information

PubMed Central

2015-01-01

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906
Database resources of the National Center for Biotechnology Information

PubMed Central

2016-01-01

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191
Role for protein–protein interaction databases in human genetics

PubMed Central

Pattin, Kristine A; Moore, Jason H

2010-01-01

Proteomics and the study of protein–protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein–protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein–protein interactions in human genetics and genetic epidemiology. Since protein–protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies. PMID:19929610
Stress field modeling of the Carpathian Basin based on compiled tectonic maps

NASA Astrophysics Data System (ADS)

Albert, Gáspár; Ungvári, Zsuzsanna; Szentpéteri, Krisztián

2014-05-01

The estimation of the stress field in the Carpathian Basin is tackled by several authors. Their modeling methods usually based on measurements (borehole-, focal mechanism- and geodesic data) and the result is a possible structural pattern of the region. Our method works indirectly: the analysis is aimed to project a possible 2D stress field over the already mapped/known/compiled lineament pattern. This includes a component-wise interpolation of the tensor-field, which is based on the generated irregular point cloud in the puffer zone of the mapped lineaments. The interpolated values appear on contour and tensor maps, and show the relative stress field of the area. In 2006 Horváth et al. compiled the 'Atlas of the present-day geodynamics of the Pannonian basin'. To test our method we processed the lineaments of the 1:1 500 000 scale 'Map of neotectonic (active) structures' published in this atlas. The geodynamic parameters (i.e. normal, reverse, right- and left lateral strike-slip faults, etc.) of the lines on this map were mostly explained in the legend. We classified the linear elements according to these parameters and created a geo-referenced mapping database. This database contains the polyline sections of the map lineaments as vectors (i.e. line sections), and the directions of the stress field as attributes of these vectors. The directions of the dip-parallel-, strike-parallel- and vertical stress-vectors are calculated from the geodynamical parameters of the line section. Since we created relative stress field properties, the eigenvalues of the vectors were maximized to one. Each point in the point cloud inherits the stress property of the line section, from which it was derived. During the modeling we tried several point-cloud generating- and interpolation methods. The analysis of the interpolated tensor fields revealed that the model was able to reproduce a geodynamic synthesis of the Carpathian Basin, which can be correlated with the synthesis of the Atlas published in 2006. The method was primarily aimed to reconstruct paleo-stress fields. References Horváth, F., Bada, G., Windhoffer, G., Csontos, L., Dombrádi, E., Dövényi, P., Fodor, L., Grenerczy, G., Síkhegyi, F., Szafián, P., Székely, B., Timár, G., Tóth, L., Tóth, T. 2006: Atlas of the present-day geodynamics of the Pannonian basin: Euroconform maps with explanatory text. Magyar Geofizika 47, 133-137.
TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.

PubMed

Sonar, Krushna; Kabra, Ritika; Singh, Shailza

2018-03-12

TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.
AIM: A comprehensive Arabidopsis Interactome Module database and related interologs in plants

USDA-ARS?s Scientific Manuscript database

Systems biology analysis of protein modules is important for understanding the functional relationships between proteins in the interactome. Here, we present a comprehensive database named AIM for Arabidopsis (Arabidopsis thaliana) interactome modules. The database contains almost 250,000 modules th...
VaProS: a database-integration approach for protein/genome information retrieval.

PubMed

Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

2016-12-01

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

PubMed

Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

2018-05-21

With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Creation of a federated database of blood proteins: a powerful new tool for finding and characterizing biomarkers in serum

PubMed Central

2014-01-01

Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026
CyanoClust: comparative genome resources of cyanobacteria and plastids.

PubMed

Sasaki, Naobumi V; Sato, Naoki

2010-01-01

Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.
Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

PubMed Central

Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

2015-01-01

Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877
Analysis of single nucleotide variants of HFE gene and association to survival in The Cancer Genome Atlas GBM data

PubMed Central

Zhang, Bo; Liu, Dajiang J.; Muscat, Joshua E.; Langan, Sara T.; Connor, James R.

2017-01-01

Human hemochromatosis protein (HFE) is involved in iron metabolism. Two major HFE polymorphisms, H63D and C282Y, have been associated with an increased risk of cancers. Previously, we reported decreased gender effects in overall survival based on H63D or C282Y HFE polymorphisms patients with glioblastoma multiforme (GBM). However, the effect of other single nucleotide variation (SNV) in the HFE gene on the cancer development and progression has not been systematically studied. To expand our finding in a larger sample, and to identify other HFE SNV, we analyzed the frequency of somatic SNV in HFE gene and its relationship to survival in GBM patients using The Cancer Genome Atlas (TCGA) GBM (Caucasian only) database. We found 9 SNVs with increased frequency in blood normal of TCGA GBM patients compared to the 1000Genome. Among 9 SNVs, 7 SNVs were located in the intron and 2 SNVs (i.e., H63D, C282Y) in the exon of HFE gene. The statistical analysis demonstrated that blood normal samples of TCGA GBM have more H63D (p = 0.0002, 95% Confidence interval (CI): 0.2119–0.3223) or C282Y (p = 0.0129, 95% CI: 0.0474–0.1159) HFE polymorphisms than 1000Genome. The Kaplan-Meier survival curve for the 264 GBM samples revealed no difference between wild type (WT) HFE and H63D, and WT HFE and C282Y GBM patients. In addition, there was no difference in the survival of male/female GBM patients based on HFE genotype. There was no correlation between HFE expression and survival. In conclusion, the current results suggest that somatic HFE polymorphisms do not impact GBM patients’ survival in the TCGA data set of GBM. PMID:28358914
Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase

PubMed Central

Chakraborty, Sandeep; Minda, Renu; Salaye, Lipika; Bhattacharjee, Swapan K.; Rao, Basuthkar J.

2011-01-01

Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - C ata L ytic A ctive S ite P rediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro. PMID:22174814
MultitaskProtDB: a database of multitasking proteins.

PubMed

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.
KnotProt: a database of proteins with knots and slipknots.

PubMed

Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J; Stasiak, Andrzej; Millett, Kenneth C; Sułkowski, Piotr; Sulkowska, Joanna I

2015-01-01

The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gene-expression signature regulated by the KEAP1-NRF2-CUL3 axis is associated with a poor prognosis in head and neck squamous cell cancer.

PubMed

Namani, Akhileshwar; Matiur Rahaman, Md; Chen, Ming; Tang, Xiuwen

2018-01-06

NRF2 is the key regulator of oxidative stress in normal cells and aberrant expression of the NRF2 pathway due to genetic alterations in the KEAP1 (Kelch-like ECH-associated protein 1)-NRF2 (nuclear factor erythroid 2 like 2)-CUL3 (cullin 3) axis leads to tumorigenesis and drug resistance in many cancers including head and neck squamous cell cancer (HNSCC). The main goal of this study was to identify specific genes regulated by the KEAP1-NRF2-CUL3 axis in HNSCC patients, to assess the prognostic value of this gene signature in different cohorts, and to reveal potential biomarkers. RNA-Seq V2 level 3 data from 279 tumor samples along with 37 adjacent normal samples from patients enrolled in the The Cancer Genome Atlas (TCGA)-HNSCC study were used to identify upregulated genes using two methods (altered KEAP1-NRF2-CUL3 versus normal, and altered KEAP1-NRF2-CUL3 versus wild-type). We then used a new approach to identify the combined gene signature by integrating both datasets and subsequently tested this signature in 4 independent HNSCC datasets to assess its prognostic value. In addition, functional annotation using the DAVID v6.8 database and protein-protein interaction (PPI) analysis using the STRING v10 database were performed on the signature. A signature composed of a subset of 17 genes regulated by the KEAP1-NRF2-CUL3 axis was identified by overlapping both the upregulated genes of altered versus normal (251 genes) and altered versus wild-type (25 genes) datasets. We showed that increased expression was significantly associated with poor survival in 4 independent HNSCC datasets, including the TCGA-HNSCC dataset. Furthermore, Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and PPI analysis revealed that most of the genes in this signature are associated with drug metabolism and glutathione metabolic pathways. Altogether, our study emphasizes the discovery of a gene signature regulated by the KEAP1-NRF2-CUL3 axis which is strongly associated with tumorigenesis and drug resistance in HNSCC. This 17-gene signature provides potential biomarkers and therapeutic targets for HNSCC cases in which the NRF2 pathway is activated.
BLAST and FASTA similarity searching for multiple sequence alignment.

PubMed

Pearson, William R

2014-01-01

BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Reference System of DNA and Protein Sequences on CD-ROM

NASA Astrophysics Data System (ADS)

Nasu, Hisanori; Ito, Toshiaki

DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

PubMed Central

Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

2011-01-01

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
Welcome - TampaBay.WaterAtlas.org

Science.gov Websites

An edition of: WaterAtlas.orgPresented By: USF Water Institute Choose a Water Atlas Charlotte Harbor NEP Water Atlas Hillsborough County Water Atlas Lake County Water Atlas Manatee County Water Atlas Orange County Water Atlas Pinellas County Water Atlas Polk County Water Atlas Sarasota County Water Atlas
Projections for fast protein structure retrieval

PubMed Central

Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

2006-01-01

Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification.

PubMed

Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier

2003-01-01

The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
Integrating In Silico Resources to Map a Signaling Network

PubMed Central

Liu, Hanqing; Beck, Tim N.; Golemis, Erica A.; Serebriiskii, Ilya G.

2013-01-01

The abundance of publicly available life science databases offer a wealth of information that can support interpretation of experimentally derived data and greatly enhance hypothesis generation. Protein interaction and functional networks are not simply new renditions of existing data: they provide the opportunity to gain insights into the specific physical and functional role a protein plays as part of the biological system. In this chapter, we describe different in silico tools that can quickly and conveniently retrieve data from existing data repositories and discuss how the available tools are best utilized for different purposes. While emphasizing protein-protein interaction databases (e.g., BioGrid and IntAct), we also introduce metasearch platforms such as STRING and GeneMANIA, pathway databases (e.g., BioCarta and Pathway Commons), text mining approaches (e.g., PubMed and Chilibot), and resources for drug-protein interactions, genetic information for model organisms and gene expression information based on microarray data mining. Furthermore, we provide a simple step-by-step protocol to building customized protein-protein interaction networks in Cytoscape, a powerful network assembly and visualization program, integrating data retrieved from these various databases. As we illustrate, generation of composite interaction networks enables investigators to extract significantly more information about a given biological system than utilization of a single database or sole reliance on primary literature. PMID:24233784
sc-PDB: a 3D-database of ligandable binding sites--10 years on.

PubMed

Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

2015-01-01

The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
NSDNA: a manually curated database of experimentally supported ncRNAs associated with nervous system diseases

PubMed Central

Wang, Jianjian; Cao, Yuze; Zhang, Huixue; Wang, Tianfeng; Tian, Qinghua; Lu, Xiaoyu; Lu, Xiaoyan; Kong, Xiaotong; Liu, Zhaojun; Wang, Ning; Zhang, Shuai; Ma, Heping; Ning, Shangwei; Wang, Lihua

2017-01-01

The Nervous System Disease NcRNAome Atlas (NSDNA) (http://www.bio-bigdata.net/nsdna/) is a manually curated database that provides comprehensive experimentally supported associations about nervous system diseases (NSDs) and noncoding RNAs (ncRNAs). NSDs represent a common group of disorders, some of which are characterized by high morbidity and disabilities. The pathogenesis of NSDs at the molecular level remains poorly understood. ncRNAs are a large family of functionally important RNA molecules. Increasing evidence shows that diverse ncRNAs play a critical role in various NSDs. Mining and summarizing NSD–ncRNA association data can help researchers discover useful information. Hence, we developed an NSDNA database that documents 24 713 associations between 142 NSDs and 8593 ncRNAs in 11 species, curated from more than 1300 articles. This database provides a user-friendly interface for browsing and searching and allows for data downloading flexibility. In addition, NSDNA offers a submission page for researchers to submit novel NSD–ncRNA associations. It represents an extremely useful and valuable resource for researchers who seek to understand the functions and molecular mechanisms of ncRNA involved in NSDs. PMID:27899613
SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

PubMed Central

Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

2014-01-01

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants.

PubMed

Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung

2016-01-01

Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.
The brain MRI classification problem from wavelets perspective

NASA Astrophysics Data System (ADS)

Bendib, Mohamed M.; Merouani, Hayet F.; Diaba, Fatma

2015-02-01

Haar and Daubechies 4 (DB4) are the most used wavelets for brain MRI (Magnetic Resonance Imaging) classification. The former is simple and fast to compute while the latter is more complex and offers a better resolution. This paper explores the potential of both of them in performing Normal versus Pathological discrimination on the one hand, and Multiclassification on the other hand. The Whole Brain Atlas is used as a validation database, and the Random Forest (RF) algorithm is employed as a learning approach. The achieved results are discussed and statistically compared.
Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

PubMed

Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

2007-06-01

Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.
The COG database: a tool for genome-scale analysis of protein functions and evolution

PubMed Central

Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

2000-01-01

Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175
Mining databases for protein aggregation: a review.

PubMed

Tsiolaki, Paraskevi L; Nastou, Katerina C; Hamodrakas, Stavros J; Iconomidou, Vassiliki A

2017-09-01

Protein aggregation is an active area of research in recent decades, since it is the most common and troubling indication of protein instability. Understanding the mechanisms governing protein aggregation and amyloidogenesis is a key component to the aetiology and pathogenesis of many devastating disorders, including Alzheimer's disease or type 2 diabetes. Protein aggregation data are currently found "scattered" in an increasing number of repositories, since advances in computational biology greatly influence this field of research. This review exploits the various resources of aggregation data and attempts to distinguish and analyze the biological knowledge they contain, by introducing protein-based, fragment-based and disease-based repositories, related to aggregation. In order to gain a broad overview of the available repositories, a novel comprehensive network maps and visualizes the current association between aggregation databases and other important databases and/or tools and discusses the beneficial role of community annotation. The need for unification of aggregation databases in a common platform is also addressed.
ATtRACT-a database of RNA-binding proteins and associated motifs.

PubMed

Giudice, Girolamo; Sánchez-Cabo, Fátima; Torroja, Carlos; Lara-Pezzi, Enrique

2016-01-01

RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es. © The Author(s) 2016. Published by Oxford University Press.
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

PubMed Central

Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

2015-01-01

We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660
iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence

PubMed Central

Turner, Brian; Razick, Sabry; Turinsky, Andrei L.; Vlasblom, James; Crowdy, Edgard K.; Cho, Emerson; Morrison, Kyle; Wodak, Shoshana J.

2010-01-01

We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ PMID:20940177
HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species.

PubMed

López, Yosvany; Nakai, Kenta; Patil, Ashwini

2015-01-01

HitPredict is a consolidated resource of experimentally identified, physical protein-protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein-protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein-protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp. © The Author(s) 2015. Published by Oxford University Press.
Exploiting genomic data to identify proteins involved in abalone reproduction.

PubMed

Mendoza-Porras, Omar; Botwright, Natasha A; McWilliam, Sean M; Cook, Mathew T; Harris, James O; Wijffels, Gene; Colgrave, Michelle L

2014-08-28

Aside from their critical role in reproduction, abalone gonads serve as an indicator of sexual maturity and energy balance, two key considerations for effective abalone culture. Temperate abalone farmers face issues with tank restocking with highly marketable abalone owing to inefficient spawning induction methods. The identification of key proteins in sexually mature abalone will serve as the foundation for a greater understanding of reproductive biology. Addressing this knowledge gap is the first step towards improving abalone aquaculture methods. Proteomic profiling of female and male gonads of greenlip abalone, Haliotis laevigata, was undertaken using liquid chromatography-mass spectrometry. Owing to the incomplete nature of abalone protein databases, in addition to searching against two publicly available databases, a custom database comprising genomic data was used. Overall, 162 and 110 proteins were identified in females and males respectively with 40 proteins common to both sexes. For proteins involved in sexual maturation, sperm and egg structure, motility, acrosomal reaction and fertilization, 23 were identified only in females, 18 only in males and 6 were common. Gene ontology analysis revealed clear differences between the female and male protein profiles reflecting a higher rate of protein synthesis in the ovary and higher metabolic activity in the testis. A comprehensive mass spectrometry-based analysis was performed to profile the abalone gonad proteome providing the foundation for future studies of reproduction in abalone. Key proteins involved in both reproduction and energy balance were identified. Genomic resources were utilised to build a database of molluscan proteins yielding >60% more protein identifications than in a standard workflow employing public protein databases. Copyright © 2014 Elsevier B.V. All rights reserved.
Meta sequence analysis of human blood peptides and their parent proteins.

PubMed

Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

2010-04-18

Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.
PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

PubMed

Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

2017-09-01

We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events.

PubMed

Korla, Praveen Kumar; Cheng, Jack; Huang, Chien-Hung; Tsai, Jeffrey J P; Liu, Yu-Hsuan; Kurubanjerdjit, Nilubon; Hsieh, Wen-Tsong; Chen, Huey-Yi; Ng, Ka-Lok

2015-01-01

Chromosomal translocation (CT) is of enormous clinical interest because this disorder is associated with various major solid tumors and leukemia. A tumor-specific fusion gene event may occur when a translocation joins two separate genes. Currently, various CT databases provide information about fusion genes and their genomic elements. However, no database of the roles of fusion genes, in terms of essential functional and regulatory elements in oncogenesis, is available. FARE-CAFE is a unique combination of CTs, fusion proteins, protein domains, domain-domain interactions, protein-protein interactions, transcription factors and microRNAs, with subsequent experimental information, which cannot be found in any other CT database. Genomic DNA information including, for example, manually collected exact locations of the first and second break points, sequences and karyotypes of fusion genes are included. FARE-CAFE will substantially facilitate the cancer biologist's mission of elucidating the pathogenesis of various types of cancer. This database will ultimately help to develop 'novel' therapeutic approaches. Database URL: http://ppi.bioinfo.asia.edu.tw/FARE-CAFE. © The Author(s) 2015. Published by Oxford University Press.
Follicle Online: an integrated database of follicle assembly, development and ovulation.

PubMed

Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Cooke, Howard J; Zhang, Yuanwei; Shi, Qinghua

2015-01-01

Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database 'Follicle Online' that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43,000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php © The Author(s) 2015. Published by Oxford University Press.
Follicle Online: an integrated database of follicle assembly, development and ovulation

PubMed Central

Hua, Juan; Xu, Bo; Yang, Yifan; Ban, Rongjun; Iqbal, Furhan; Zhang, Yuanwei; Shi, Qinghua

2015-01-01

Folliculogenesis is an important part of ovarian function as it provides the oocytes for female reproductive life. Characterizing genes/proteins involved in folliculogenesis is fundamental for understanding the mechanisms associated with this biological function and to cure the diseases associated with folliculogenesis. A large number of genes/proteins associated with folliculogenesis have been identified from different species. However, no dedicated public resource is currently available for folliculogenesis-related genes/proteins that are validated by experiments. Here, we are reporting a database ‘Follicle Online’ that provides the experimentally validated gene/protein map of the folliculogenesis in a number of species. Follicle Online is a web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome). The literature was manually curated from more than 43 000 published articles (till 1 March 2014). The Follicle Online database is implemented in PHP + MySQL + JavaScript and this user-friendly web application provides access to the stored data. In summary, we have developed a centralized database that provides users with comprehensive information about genes/proteins involved in folliculogenesis. This database can be accessed freely and all the stored data can be viewed without any registration. Database URL: http://mcg.ustc.edu.cn/sdap1/follicle/index.php PMID:25931457
DBSecSys 2.0: a database of Burkholderia mallei and Burkholderia pseudomallei secretion systems.

PubMed

Memišević, Vesna; Kumar, Kamal; Zavaljevski, Nela; DeShazer, David; Wallqvist, Anders; Reifman, Jaques

2016-09-20

Burkholderia mallei and B. pseudomallei are the causative agents of glanders and melioidosis, respectively, diseases with high morbidity and mortality rates. B. mallei and B. pseudomallei are closely related genetically; B. mallei evolved from an ancestral strain of B. pseudomallei by genome reduction and adaptation to an obligate intracellular lifestyle. Although these two bacteria cause different diseases, they share multiple virulence factors, including bacterial secretion systems, which represent key components of bacterial pathogenicity. Despite recent progress, the secretion system proteins for B. mallei and B. pseudomallei, their pathogenic mechanisms of action, and host factors are not well characterized. We previously developed a manually curated database, DBSecSys, of bacterial secretion system proteins for B. mallei. Here, we report an expansion of the database with corresponding information about B. pseudomallei. DBSecSys 2.0 contains comprehensive literature-based and computationally derived information about B. mallei ATCC 23344 and literature-based and computationally derived information about B. pseudomallei K96243. The database contains updated information for 163 B. mallei proteins from the previous database and 61 additional B. mallei proteins, and new information for 281 B. pseudomallei proteins associated with 5 secretion systems, their 1,633 human- and murine-interacting targets, and 2,400 host-B. mallei interactions and 2,286 host-B. pseudomallei interactions. The database also includes information about 13 pathogenic mechanisms of action for B. mallei and B. pseudomallei secretion system proteins inferred from the available literature or computationally. Additionally, DBSecSys 2.0 provides details about 82 virulence attenuation experiments for 52 B. mallei secretion system proteins and 98 virulence attenuation experiments for 61 B. pseudomallei secretion system proteins. We updated the Web interface and data access layer to speed-up users' search of detailed information for orthologous proteins related to secretion systems of the two pathogens. The updates of DBSecSys 2.0 provide unique capabilities to access comprehensive information about secretion systems of B. mallei and B. pseudomallei. They enable studies and comparisons of corresponding proteins of these two closely related pathogens and their host-interacting partners. The database is available at http://dbsecsys.bhsai.org .
Gene and protein nomenclature in public databases

PubMed Central

Fundel, Katrin; Zimmer, Ralf

2006-01-01

Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134
MultitaskProtDB: a database of multitasking proteins

PubMed Central

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth. PMID:24253302
SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

PubMed

Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E

2017-02-03

SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
NIS expression in thyroid tumors, relation with prognosis clinicopathological and molecular features

PubMed Central

Tavares, Catarina; Coelho, Maria João; Eloy, Catarina; Melo, Miguel; da Rocha, Adriana Gaspar; Pestana, Ana; Batista, Rui; Ferreira, Luciana Bueno; Rios, Elisabete; Selmi-Ruby, Samia; Cavadas, Bruno; Pereira, Luísa; Sobrinho Simões, Manuel

2018-01-01

Thyroid cancer therapy is based on surgery followed by radioiodine treatment. The incorporation of radioiodine by cancer cells is mediated by sodium iodide symporter (NIS) (codified by the SLC5A5 gene), that is functional only when targeted to the cell membrane. We aimed to evaluate if NIS expression in thyroid primary tumors would be helpful in predicting tumor behavior, response to therapy and prognosis. NIS expression was addressed by qPCR and immunohistochemistry. In order to validate our data, we also studied SLC5A5 expression on 378 primary papillary thyroid carcinomas from The Cancer Genome Atlas (TCGA) database. In our series, SLC5A5 expression was lower in carcinomas with vascular invasion and with extrathyroidal extension and in those harboring BRAFV600E mutation. Analysis of SLC5A5 expression from TCGA database confirmed our results. Furthermore, it showed that larger tumors, with locoregional recurrences and/or distant metastases or harboring RAS, BRAF and/or TERT promoter (TERTp) mutations presented significantly less SLC5A5 expression. Regarding immunohistochemistry, 12/211 of the cases demonstrated NIS in the membrane of tumor cells, those cases showed variable outcomes concerning therapy success, prognosis and all but one were wild type for BRAF, NRAS and TERTp mutations. SLC5A5 mRNA lower expression is associated with features of aggressiveness and with key genetic alterations involving BRAF, RAS and TERTp. Mutations in these genes seem to decrease protein expression and its targeting to the cell membrane. SLC5A5 mRNA expression is more informative than NIS immunohistochemical expression regarding tumor aggressiveness and prognostic features. PMID:29298843

Some links on this page may take you to non-federal websites. Their policies may differ from this site.