Sample records for uniprot reference clusters

  1. The Universal Protein Resource (UniProt): an expanding universe of protein information.

    PubMed

    Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

    2006-01-01

    The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.

  2. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

    PubMed

    Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis

    2016-01-01

    The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.

  3. BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins.

    PubMed

    van Heel, Auke J; de Jong, Anne; Song, Chunxu; Viel, Jakob H; Kok, Jan; Kuipers, Oscar P

    2018-05-21

    Interest in secondary metabolites such as RiPPs (ribosomally synthesized and posttranslationally modified peptides) is increasing worldwide. To facilitate the research in this field we have updated our mining web server. BAGEL4 is faster than its predecessor and is now fully independent from ORF-calling. Gene clusters of interest are discovered using the core-peptide database and/or through HMM motifs that are present in associated context genes. The databases used for mining have been updated and extended with literature references and links to UniProt and NCBI. Additionally, we have included automated promoter and terminator prediction and the option to upload RNA expression data, which can be displayed along with the identified clusters. Further improvements include the annotation of the context genes, which is now based on a fast blast against the prokaryote part of the UniRef90 database, and the improved web-BLAST feature that dynamically loads structural data such as internal cross-linking from UniProt. Overall BAGEL4 provides the user with more information through a user-friendly web-interface which simplifies data evaluation. BAGEL4 is freely accessible at http://bagel4.molgenrug.nl.

  4. The BioExtract Server: a web-based bioinformatic workflow platform

    PubMed Central

    Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

    2011-01-01

    The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

  5. E-MSD: an integrated data resource for bioinformatics.

    PubMed

    Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.

  6. E-MSD: an integrated data resource for bioinformatics

    PubMed Central

    Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

    2005-01-01

    The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192

  7. Infrastructure for the life sciences: design and implementation of the UniProt website.

    PubMed

    Jain, Eric; Bairoch, Amos; Duvaud, Severine; Phan, Isabelle; Redaschi, Nicole; Suzek, Baris E; Martin, Maria J; McGarvey, Peter; Gasteiger, Elisabeth

    2009-05-08

    The UniProt consortium was formed in 2002 by groups from the Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) at Georgetown University, and soon afterwards the website http://www.uniprot.org was set up as a central entry point to UniProt resources. Requests to this address were redirected to one of the three organisations' websites. While these sites shared a set of static pages with general information about UniProt, their pages for searching and viewing data were different. To provide users with a consistent view and to cut the cost of maintaining three separate sites, the consortium decided to develop a common website for UniProt. Following several years of intense development and a year of public beta testing, the http://www.uniprot.org domain was switched to the newly developed site described in this paper in July 2008. The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. These tools include full text and field-based text search, similarity search, multiple sequence alignment, batch retrieval and database identifier mapping. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access.http://www.uniprot.org/ is open for both academic and commercial use. The site was built with open source tools and libraries. Feedback is very welcome and should be sent to help@uniprot.org. The new UniProt website makes accessing and understanding UniProt easier than ever. The two main lessons learned are that getting the basics right for such a data provider website has huge benefits, but is not trivial and easy to underestimate, and that there is no substitute for using empirical data throughout the development process to decide on what is and what is not working for your users.

  8. LipidHome: a database of theoretical lipids optimized for high throughput mass spectrometry lipidomics.

    PubMed

    Foster, Joseph M; Moreno, Pablo; Fabregat, Antonio; Hermjakob, Henning; Steinbeck, Christoph; Apweiler, Rolf; Wakelam, Michael J O; Vizcaíno, Juan Antonio

    2013-01-01

    Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.

  9. Improvements in the Protein Identifier Cross-Reference service.

    PubMed

    Wein, Samuel P; Côté, Richard G; Dumousseau, Marine; Reisinger, Florian; Hermjakob, Henning; Vizcaíno, Juan A

    2012-07-01

    The Protein Identifier Cross-Reference (PICR) service is a tool that allows users to map protein identifiers, protein sequences and gene identifiers across over 100 different source databases. PICR takes input through an interactive website as well as Representational State Transfer (REST) and Simple Object Access Protocol (SOAP) services. It returns the results as HTML pages, XLS and CSV files. It has been in production since 2007 and has been recently enhanced to add new functionality and increase the number of databases it covers. Protein subsequences can be Basic Local Alignment Search Tool (BLAST) against the UniProt Knowledgebase (UniProtKB) to provide an entry point to the standard PICR mapping algorithm. In addition, gene identifiers from UniProtKB and Ensembl can now be submitted as input or mapped to as output from PICR. We have also implemented a 'best-guess' mapping algorithm for UniProt. In this article, we describe the usefulness of PICR, how these changes have been implemented, and the corresponding additions to the web services. Finally, we explain that the number of source databases covered by PICR has increased from the initial 73 to the current 102. New resources include several new species-specific Ensembl databases as well as the Ensembl Genome ones. PICR can be accessed at http://www.ebi.ac.uk/Tools/picr/.

  10. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

    PubMed Central

    Arighi, Cecilia N; Magrane, Michele; Bateman, Alex; Wei, Chih-Hsuan; Lu, Zhiyong; Boutet, Emmanuel; Bye-A-Jee, Hema; Famiglietti, Maria Livia; Roechert, Bernd; UniProt Consortium, The

    2017-01-01

    Abstract Motivation Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. Results With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000–10 000 papers are curated in UniProt each year while curators evaluate 50 000–70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2–3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. Availability and implementation UniProt is freely available at http://www.uniprot.org/. Contact sylvain.poux@sib.swiss Supplementary information Supplementary data are available at Bioinformatics online. PMID:29036270

  11. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine.

    PubMed

    Singhal, Ayush; Simmons, Michael; Lu, Zhiyong

    2016-11-01

    The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient's genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer's disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.

  12. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.

    PubMed

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-03-08

    High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.

  13. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases

    PubMed Central

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-01-01

    Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575

  14. PICKLE 2.0: A human protein-protein interaction meta-database employing data integration via genetic information ontology

    PubMed Central

    Gioutlakis, Aris; Klapa, Maria I.

    2017-01-01

    It has been acknowledged that source databases recording experimentally supported human protein-protein interactions (PPIs) exhibit limited overlap. Thus, the reconstruction of a comprehensive PPI network requires appropriate integration of multiple heterogeneous primary datasets, presenting the PPIs at various genetic reference levels. Existing PPI meta-databases perform integration via normalization; namely, PPIs are merged after converted to a certain target level. Hence, the node set of the integrated network depends each time on the number and type of the combined datasets. Moreover, the irreversible a priori normalization process hinders the identification of normalization artifacts in the integrated network, which originate from the nonlinearity characterizing the genetic information flow. PICKLE (Protein InteraCtion KnowLedgebasE) 2.0 implements a new architecture for this recently introduced human PPI meta-database. Its main novel feature over the existing meta-databases is its approach to primary PPI dataset integration via genetic information ontology. Building upon the PICKLE principles of using the reviewed human complete proteome (RHCP) of UniProtKB/Swiss-Prot as the reference protein interactor set, and filtering out protein interactions with low probability of being direct based on the available evidence, PICKLE 2.0 first assembles the RHCP genetic information ontology network by connecting the corresponding genes, nucleotide sequences (mRNAs) and proteins (UniProt entries) and then integrates PPI datasets by superimposing them on the ontology network without any a priori transformations. Importantly, this process allows the resulting heterogeneous integrated network to be reversibly normalized to any level of genetic reference without loss of the original information, the latter being used for identification of normalization biases, and enables the appraisal of potential false positive interactions through PPI source database cross-checking. The PICKLE web-based interface (www.pickle.gr) allows for the simultaneous query of multiple entities and provides integrated human PPI networks at either the protein (UniProt) or the gene level, at three PPI filtering modes. PMID:29023571

  15. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

    PubMed

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-07-01

    UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.

  16. PANDA: Protein function prediction using domain architecture and affinity propagation.

    PubMed

    Wang, Zheng; Zhao, Chenguang; Wang, Yiheng; Sun, Zheng; Wang, Nan

    2018-02-22

    We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .

  17. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

    PubMed Central

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-01-01

    Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742

  18. METSP: a maximum-entropy classifier based text mining tool for transporter-substrate identification with semistructured text.

    PubMed

    Zhao, Min; Chen, Yanming; Qu, Dacheng; Qu, Hong

    2015-01-01

    The substrates of a transporter are not only useful for inferring function of the transporter, but also important to discover compound-compound interaction and to reconstruct metabolic pathway. Though plenty of data has been accumulated with the developing of new technologies such as in vitro transporter assays, the search for substrates of transporters is far from complete. In this article, we introduce METSP, a maximum-entropy classifier devoted to retrieve transporter-substrate pairs (TSPs) from semistructured text. Based on the high quality annotation from UniProt, METSP achieves high precision and recall in cross-validation experiments. When METSP is applied to 182,829 human transporter annotation sentences in UniProt, it identifies 3942 sentences with transporter and compound information. Finally, 1547 confidential human TSPs are identified for further manual curation, among which 58.37% pairs with novel substrates not annotated in public transporter databases. METSP is the first efficient tool to extract TSPs from semistructured annotation text in UniProt. This tool can help to determine the precise substrates and drugs of transporters, thus facilitating drug-target prediction, metabolic network reconstruction, and literature classification.

  19. The UniProtKB guide to the human proteome

    PubMed Central

    Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole

    2016-01-01

    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845

  20. Towards Spectral Library-free MALDI-TOF MS Bacterial Identification.

    PubMed

    Cheng, Ding; Qiao, Liang; Horvatovich, Péter

    2018-05-11

    Bacterial identification is of great importance in clinical diagnosis, environmental monitoring and food safety control. Among various strategies, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has drawn significant interests, and has been clinically used. Nevertheless, current bioinformatics solutions use spectral libraries for the identification of bacterial strains. Spectral library generation requires acquisition of MALDI-TOF spectra from monoculture bacterial colonies, which is time-consuming and not possible for many species and strains. We propose a strategy for bacterial typing by MALDI-TOF using protein sequences from public database, i.e. UniProt. Ten genes were identified to encode proteins most often observed by MALD-TOF from bacteria through 500 times repeated a 10-fold double cross-validation procedure, using 403 MALDI-TOF spectra corresponding to 14 genera, 81 species and 403 strains, and the protein sequences of 1276 species in UniProt. The 10 genes were then used to annotate peaks on MALDI-TOF spectra of bacteria for bacterial identification. With the approach, bacteria can be identified at the genus level by searching against a database containing the protein sequences of 42 genera of bacteria from UniProt. Our approach identified 84.1% of the 403 spectra correctly at the genus level. Source code of the algorithm is available at https://github.com/dipcarbon/BacteriaMSLF.

  1. ProtVista: visualization of protein sequence annotations.

    PubMed

    Watkins, Xavier; Garcia, Leyla J; Pundir, Sangya; Martin, Maria J

    2017-07-01

    ProtVista is a comprehensive visualization tool for the graphical representation of protein sequence features in the UniProt Knowledgebase, experimental proteomics and variation public datasets. The complexity and relationships in this wealth of data pose a challenge in interpretation. Integrative visualization approaches such as provided by ProtVista are thus essential for researchers to understand the data and, for instance, discover patterns affecting function and disease associations. ProtVista is a JavaScript component released as an open source project under the Apache 2 License. Documentation and source code are available at http://ebi-uniprot.github.io/ProtVista/ . martin@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  2. The Pfam protein families database: towards a more sustainable future.

    PubMed

    Finn, Robert D; Coggill, Penelope; Eberhardt, Ruth Y; Eddy, Sean R; Mistry, Jaina; Mitchell, Alex L; Potter, Simon C; Punta, Marco; Qureshi, Matloob; Sangrador-Vegas, Amaia; Salazar, Gustavo A; Tate, John; Bateman, Alex

    2016-01-04

    In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Linking microarray reporters with protein functions.

    PubMed

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A

    2007-09-26

    The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.

  4. Maize databases

    USDA-ARS?s Scientific Manuscript database

    This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...

  5. bioDBnet - Biological Database Network

    Cancer.gov

    bioDBnet is a comprehensive resource of most of the biological databases available from different sites like NCBI, Uniprot, EMBL, Ensembl, Affymetrix. It provides a queryable interface to all the databases available, converts identifiers from one database into another and generates comprehensive reports.

  6. Complex Event Extraction using DRUM

    DTIC Science & Technology

    2015-10-01

    towards tackling these challenges . Figure 9. Evaluation results for eleven teams. The diamond ◆ represents the results of our system. The two topmost...Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/ VLC -2000). The UniProt

  7. Linking microarray reporters with protein functions

    PubMed Central

    Gaj, Stan; van Erk, Arie; van Haaften, Rachel IM; Evelo, Chris TA

    2007-01-01

    Background The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. Results This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Conclusion Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/. PMID:17897448

  8. Statistical distribution of amino acid sequences: a proof of Darwinian evolution.

    PubMed

    Eitner, Krystian; Koch, Uwe; Gaweda, Tomasz; Marciniak, Jedrzej

    2010-12-01

    The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT-TREMBL database and the listing for selected species and enzymes. UNIPROT-TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented. eitner@amu.edu.pl Supplementary data are available at Bioinformatics online.

  9. Singlet-paired coupled cluster theory for open shells

    NASA Astrophysics Data System (ADS)

    Gomez, John A.; Henderson, Thomas M.; Scuseria, Gustavo E.

    2016-06-01

    Restricted single-reference coupled cluster theory truncated to single and double excitations accurately describes weakly correlated systems, but often breaks down in the presence of static or strong correlation. Good coupled cluster energies in the presence of degeneracies can be obtained by using a symmetry-broken reference, such as unrestricted Hartree-Fock, but at the cost of good quantum numbers. A large body of work has shown that modifying the coupled cluster ansatz allows for the treatment of strong correlation within a single-reference, symmetry-adapted framework. The recently introduced singlet-paired coupled cluster doubles (CCD0) method is one such model, which recovers correct behavior for strong correlation without requiring symmetry breaking in the reference. Here, we extend singlet-paired coupled cluster for application to open shells via restricted open-shell singlet-paired coupled cluster singles and doubles (ROCCSD0). The ROCCSD0 approach retains the benefits of standard coupled cluster theory and recovers correct behavior for strongly correlated, open-shell systems using a spin-preserving ROHF reference.

  10. MyDas, an Extensible Java DAS Server

    PubMed Central

    Jimenez, Rafael C.; Quinn, Antony F.; Jenkinson, Andrew M.; Mulder, Nicola; Martin, Maria; Hunter, Sarah; Hermjakob, Henning

    2012-01-01

    A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users. We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details. PMID:23028496

  11. MyDas, an extensible Java DAS server.

    PubMed

    Salazar, Gustavo A; García, Leyla J; Jones, Philip; Jimenez, Rafael C; Quinn, Antony F; Jenkinson, Andrew M; Mulder, Nicola; Martin, Maria; Hunter, Sarah; Hermjakob, Henning

    2012-01-01

    A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.

  12. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors

    PubMed Central

    Jin, Jinpu; Zhang, He; Kong, Lei; Gao, Ge; Luo, Jingchu

    2014-01-01

    With the aim to provide a resource for functional and evolutionary study of plant transcription factors (TFs), we updated the plant TF database PlantTFDB to version 3.0 (http://planttfdb.cbi.pku.edu.cn). After refining the TF classification pipeline, we systematically identified 129 288 TFs from 83 species, of which 67 species have genome sequences, covering main lineages of green plants. Besides the abundant annotation provided in the previous version, we generated more annotations for identified TFs, including expression, regulation, interaction, conserved elements, phenotype information, expert-curated descriptions derived from UniProt, TAIR and NCBI GeneRIF, as well as references to provide clues for functional studies of TFs. To help identify evolutionary relationship among identified TFs, we assigned 69 450 TFs into 3924 orthologous groups, and constructed 9217 phylogenetic trees for TFs within the same families or same orthologous groups, respectively. In addition, we set up a TF prediction server in this version for users to identify TFs from their own sequences. PMID:24174544

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gomez, John A.; Henderson, Thomas M.; Scuseria, Gustavo E.

    Restricted single-reference coupled cluster theory truncated to single and double excitations accurately describes weakly correlated systems, but often breaks down in the presence of static or strong correlation. Good coupled cluster energies in the presence of degeneracies can be obtained by using a symmetry-broken reference, such as unrestricted Hartree-Fock, but at the cost of good quantum numbers. A large body of work has shown that modifying the coupled cluster ansatz allows for the treatment of strong correlation within a single-reference, symmetry-adapted framework. The recently introduced singlet-paired coupled cluster doubles (CCD0) method is one such model, which recovers correct behavior formore » strong correlation without requiring symmetry breaking in the reference. Here, we extend singlet-paired coupled cluster for application to open shells via restricted open-shell singlet-paired coupled cluster singles and doubles (ROCCSD0). The ROCCSD0 approach retains the benefits of standard coupled cluster theory and recovers correct behavior for strongly correlated, open-shell systems using a spin-preserving ROHF reference.« less

  14. SNPdbe: constructing an nsSNP functional impacts database.

    PubMed

    Schaefer, Christian; Meier, Alice; Rost, Burkhard; Bromberg, Yana

    2012-02-15

    Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. http://www.rostlab.org/services/snpdbe.

  15. SLiMSearch 2.0: biological context for short linear motifs in proteins

    PubMed Central

    Davey, Norman E.; Haslam, Niall J.; Shields, Denis C.

    2011-01-01

    Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch 2.0 (Short, Linear Motif Search) web server allows researchers to identify occurrences of a user-defined SLiM in a proteome, using conservation and protein disorder context statistics to rank occurrences. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. For each motif occurrence, overlapping UniProt features and annotated SLiMs are displayed. Visualization also includes annotated multiple sequence alignments surrounding each occurrence, showing conservation and protein disorder statistics in addition to known and predicted SLiMs, protein domains and known post-translational modifications. In addition, enrichment of Gene Ontology terms and protein interaction partners are provided as indicators of possible motif function. All web server results are available for download. Users can search motifs against the human proteome or a subset thereof defined by Uniprot accession numbers or GO term. The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch2.html. PMID:21622654

  16. Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags

    PubMed Central

    Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete

    2007-01-01

    Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547

  17. Center for Regenerative Biology and Medicine at Mount Desert Island Biological Laboratory

    DTIC Science & Technology

    2012-06-01

    Code Axolotl microRNAs Zebrafish Polypterus 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...controlled in both Polypterus and axolotl samples. These comparisons revealed a total of 2779 shared genes that are significantly upregulated during...UPREGULATED DOWNREGULATED Figure 1: Venn diagram of UniProt protein sequence IDs among Axolotl and Polypterus contigs that were up-regulated

  18. Databases, Repositories, and Other Data Resources in Structural Biology.

    PubMed

    Zheng, Heping; Porebski, Przemyslaw J; Grabowski, Marek; Cooper, David R; Minor, Wladek

    2017-01-01

    Structural biology, like many other areas of modern science, produces an enormous amount of primary, derived, and "meta" data with a high demand on data storage and manipulations. Primary data come from various steps of sample preparation, diffraction experiments, and functional studies. These data are not only used to obtain tangible results, like macromolecular structural models, but also to enrich and guide our analysis and interpretation of various biomedical problems. Herein we define several categories of data resources, (a) Archives, (b) Repositories, (c) Databases, and (d) Advanced Information Systems, that can accommodate primary, derived, or reference data. Data resources may be used either as web portals or internally by structural biology software. To be useful, each resource must be maintained, curated, as well as integrated with other resources. Ideally, the system of interconnected resources should evolve toward comprehensive "hubs", or Advanced Information Systems. Such systems, encompassing the PDB and UniProt, are indispensable not only for structural biology, but for many related fields of science. The categories of data resources described herein are applicable well beyond our usual scientific endeavors.

  19. Databases, Repositories and Other Data Resources in Structural Biology

    PubMed Central

    Zheng, Heping; Porebski, Przemyslaw J.; Grabowski, Marek; Cooper, David R.; Minor, Wladek

    2017-01-01

    Structural biology, like many other areas of modern science, produces an enormous amount of primary, derived, and “meta” data with a high demand on data storage and manipulations. Primary data comes from various steps of sample preparation, diffraction experiments, and functional studies. These data are not only used to obtain tangible results, like macromolecular structural models, but also to enrich and guide our analysis and interpretation of existing biomedical studies. Herein we define several categories of data resources, (a) Archives, (b) Repositories, (c) “Databases” and (d) Advanced Information Systems, that can accommodate primary, derived, or reference data. Data resources may be used either as web portals or internally by structural biology software. To be useful, each resource must be maintained, curated, and be integrated with other resources. Ideally, the system of interconnected resources should evolve toward comprehensive “hubs” or Advanced Information Systems. Such systems, encompassing the PDB and UniProt, are indispensable not only for structural biology, but for many related fields of science. The categories of data resources described herein are applicable well beyond our usual scientific endeavors. PMID:28573593

  20. MultitaskProtDB: a database of multitasking proteins.

    PubMed

    Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

    2014-01-01

    We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.

  1. SUPERFAMILY 1.75 including a domain-centric gene ontology method.

    PubMed

    de Lima Morais, David A; Fang, Hai; Rackham, Owen J L; Wilson, Derek; Pethica, Ralph; Chothia, Cyrus; Gough, Julian

    2011-01-01

    The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and other gene collections such as UniProt. All models and assignments are available to browse and download at http://supfam.org. A new hidden Markov model library based on SCOP 1.75 has been created and a previously ignored class of SCOP, coiled coils, is now included. Our scoring component now uses HMMER3, which is in orders of magnitude faster and produces superior results. A cloud-based pipeline was implemented and is publicly available at Amazon web services elastic computer cloud. The SUPERFAMILY reference tree of life has been improved allowing the user to highlight a chosen superfamily, family or domain architecture on the tree of life. The most significant advance in SUPERFAMILY is that now it contains a domain-based gene ontology (GO) at the superfamily and family levels. A new methodology was developed to ensure a high quality GO annotation. The new methodology is general purpose and has been used to produce domain-based phenotypic ontologies in addition to GO.

  2. The unique peptidome: Taxon-specific tryptic peptides as biomarkers for targeted metaproteomics.

    PubMed

    Mesuere, Bart; Van der Jeugt, Felix; Devreese, Bart; Vandamme, Peter; Dawyndt, Peter

    2016-09-01

    The Unique Peptide Finder (http://unipept.ugent.be/peptidefinder) is an interactive web application to quickly hunt for tryptic peptides that are unique to a particular species, genus, or any other taxon. Biodiversity within the target taxon is represented by a set of proteomes selected from a monthly updated list of complete and nonredundant UniProt proteomes, supplemented with proprietary proteomes loaded into persistent local browser storage. The software computes and visualizes pan and core peptidomes as unions and intersections of tryptic peptides occurring in the selected proteomes. In addition, it also computes and displays unique peptidomes as the set of all tryptic peptides that occur in all selected proteomes but not in any UniProt record not assigned to the target taxon. As a result, the unique peptides can serve as robust biomarkers for the target taxon, for example, in targeted metaproteomics studies. Computations are extremely fast since they are underpinned by the Unipept database, the lowest common ancestor algorithm implemented in Unipept and modern web technologies that facilitate in-browser data storage and parallel processing. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. New Insight Into the Diversity of SemiSWEET Sugar Transporters and the Homologs in Prokaryotes

    PubMed Central

    Jia, Baolei; Hao, Lujiang; Xuan, Yuan Hu; Jeon, Che Ok

    2018-01-01

    Sugars will eventually be exported transporters (SWEETs) and SemiSWEETs represent a family of sugar transporters in eukaryotes and prokaryotes, respectively. SWEETs contain seven transmembrane helices (TMHs), while SemiSWEETs contain three. The functions of SemiSWEETs are less studied. In this perspective article, we analyzed the diversity and conservation of SemiSWEETs and further proposed the possible functions. 1,922 SemiSWEET homologs were retrieved from the UniProt database, which is not proportional to the sequenced prokaryotic genomes. However, these proteins are very diverse in sequences and can be classified into 19 clusters when >50% sequence identity is required. Moreover, a gene context analysis indicated that several SemiSWEETs are located in the operons that are related to diverse carbohydrate metabolism. Several proteins with seven TMHs can be found in bacteria, and sequence alignment suggested that these proteins in bacteria may be formed by the duplication and fusion. Multiple sequence alignments showed that the amino acids for sugar translocation are still conserved and coevolved, although the sequences show diversity. Among them, the functions of a few amino acids are still not clear. These findings highlight the challenges that exist in SemiSWEETs and provide future researchers the foundation to explore these uncharted areas. PMID:29872447

  4. New Insight Into the Diversity of SemiSWEET Sugar Transporters and the Homologs in Prokaryotes.

    PubMed

    Jia, Baolei; Hao, Lujiang; Xuan, Yuan Hu; Jeon, Che Ok

    2018-01-01

    Sugars will eventually be exported transporters (SWEETs) and SemiSWEETs represent a family of sugar transporters in eukaryotes and prokaryotes, respectively. SWEETs contain seven transmembrane helices (TMHs), while SemiSWEETs contain three. The functions of SemiSWEETs are less studied. In this perspective article, we analyzed the diversity and conservation of SemiSWEETs and further proposed the possible functions. 1,922 SemiSWEET homologs were retrieved from the UniProt database, which is not proportional to the sequenced prokaryotic genomes. However, these proteins are very diverse in sequences and can be classified into 19 clusters when >50% sequence identity is required. Moreover, a gene context analysis indicated that several SemiSWEETs are located in the operons that are related to diverse carbohydrate metabolism. Several proteins with seven TMHs can be found in bacteria, and sequence alignment suggested that these proteins in bacteria may be formed by the duplication and fusion. Multiple sequence alignments showed that the amino acids for sugar translocation are still conserved and coevolved, although the sequences show diversity. Among them, the functions of a few amino acids are still not clear. These findings highlight the challenges that exist in SemiSWEETs and provide future researchers the foundation to explore these uncharted areas.

  5. The Center for Regenerative Biology and Medicine at Mount Desert Island Biological Laboratory

    DTIC Science & Technology

    2015-08-01

    TERMS limb regeneration Positional Memory Code Axolotl microRNAs Zebrafish Polypterus 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18...injury induced regeneration of limbs/appendage tissues in Ambystoma ( axolotl ) and Polypterus animals. The defining feature of limb/appendage...downregulated in UPREGULATED DOWNREGULATED Figure 1: Venn diagram of UniProt protein sequence IDs among Axolotl and Polypterus contigs that

  6. Insight into stereochemistry of a new IMP allelic variant (IMP-55) metallo-β-lactamase identified in a clinical strain of Acinetobacter baumannii.

    PubMed

    Shakibaie, Mohammad Reza; Azizi, Omid; Shahcheraghi, Fereshteh

    2017-07-01

    Metallo-β-lactamases (MBLs) such as IMPs are broad-spectrum β-lactamases that inactivate virtually all β-lactam antibiotics including carbapenems. In this study, we investigated the hydrolytic activity, phylogenetic relationship, three dimensional (3D) structure including zinc binding motif of a new IMP variant (IMP-55) identified in a clinical strain of Acinetobacter baumannii (AB). AB strain 56 was isolated from an adult ICU of a teaching hospital in Kerman, Iran. It exhibited MIC 32μg/ml to imipenem and showed MBL activity. Hydrolytic property of the MBL enzyme was measured phenotypically. Presence of bla IMP gene encoded by class 1 integrons was detected by PCR-sequencing. Phylogenetic tree of IMP protein was constructed using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) and 3D model including zinc binding motif was predicted by bioinformatics softwares. Analysis of IMP sequence led to the identification of a novel IMP-type designated as IMP-55 (GenBank: KU299753.1; UniprotKB: A0A0S2MTX2). Impact in term of hydrolytic activity compared to the closest variants suggested efficient imipenem hydrolysis by this enzyme. Evolutionary distance matrix assessment indicated that IMP-55 protein is not closely related to other A. baumannii IMPs, however, shared 98% homology with Escherichia coli IMP-30 (UniprotKB: A0A0C5PJR0) and Pseudomonas aeruginosa IMP-1 (UniprotKB: Q19KT1). It consisted of five α-helices, ten β-sheets and six loops. A monovalent zinc ion attached to core of enzyme via His95, His97, His157 and Cys176. Multiple amino acid sequence alignments and mutational trajectory with reported IMPs showed 4 amino acid substitutions at positions 12(Phe→Ile), 31(Asp→Glu), 172(Leu→Phe) and 185(Asn→Lys). We suggest that the pleiotropic effect of mutations due to frequent administration of imipenem is responsible for emergence of new IMP variant in our hospitals. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Mapping proteins to disease terminologies: from UniProt to MeSH

    PubMed Central

    Mottaz, Anaïs; Yip, Yum L; Ruch, Patrick; Veuthey, Anne-Lise

    2008-01-01

    Background Although the UniProt KnowledgeBase is not a medical-oriented database, it contains information on more than 2,000 human proteins involved in pathologies. However, these annotations are not standardized, which impairs the interoperability between biological and clinical resources. In order to make these data easily accessible to clinical researchers, we have developed a procedure to link diseases described in the UniProtKB/Swiss-Prot entries to the MeSH disease terminology. Results We mapped disease names extracted either from the UniProtKB/Swiss-Prot entry comment lines or from the corresponding OMIM entry to the MeSH. Different methods were assessed on a benchmark set of 200 disease names manually mapped to MeSH terms. The performance of the retained procedure in term of precision and recall was 86% and 64% respectively. Using the same procedure, more than 3,000 disease names in Swiss-Prot were mapped to MeSH with comparable efficiency. Conclusions This study is a first attempt to link proteins in UniProtKB to the medical resources. The indexing we provided will help clinicians and researchers navigate from diseases to genes and from genes to diseases in an efficient way. The mapping is available at: . PMID:18460185

  8. SANSparallel: interactive homology search against Uniprot

    PubMed Central

    Somervuo, Panu; Holm, Liisa

    2015-01-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811

  9. Matching the Diversity of Sulfated Biomolecules: Creation of a Classification Database for Sulfatases Reflecting Their Substrate Specificity

    PubMed Central

    Barbeyron, Tristan; Brillet-Guéguen, Loraine; Carré, Wilfrid; Carrière, Cathelène; Caron, Christophe; Czjzek, Mirjam; Hoebeke, Mark; Michel, Gurvan

    2016-01-01

    Sulfatases cleave sulfate groups from various molecules and constitute a biologically and industrially important group of enzymes. However, the number of sulfatases whose substrate has been characterized is limited in comparison to the huge diversity of sulfated compounds, yielding functional annotations of sulfatases particularly prone to flaws and misinterpretations. In the context of the explosion of genomic data, a classification system allowing a better prediction of substrate specificity and for setting the limit of functional annotations is urgently needed for sulfatases. Here, after an overview on the diversity of sulfated compounds and on the known sulfatases, we propose a classification database, SulfAtlas (http://abims.sb-roscoff.fr/sulfatlas/), based on sequence homology and composed of four families of sulfatases. The formylglycine-dependent sulfatases, which constitute the largest family, are also divided by phylogenetic approach into 73 subfamilies, each subfamily corresponding to either a known specificity or to an uncharacterized substrate. SulfAtlas summarizes information about the different families of sulfatases. Within a family a web page displays the list of its subfamilies (when they exist) and the list of EC numbers. The family or subfamily page shows some descriptors and a table with all the UniProt accession numbers linked to the databases UniProt, ExplorEnz, and PDB. PMID:27749924

  10. SCUD: fast structure clustering of decoys using reference state to remove overall rotation.

    PubMed

    Li, Hongzhi; Zhou, Yaoqi

    2005-08-01

    We developed a method for fast decoy clustering by using reference root-mean-squared distance (rRMSD) rather than commonly used pairwise RMSD (pRMSD) values. For 41 proteins with 2000 decoys each, the computing efficiency increases nine times without a significant change in the accuracy of near-native selections. Tests on additional protein decoys based on different reference conformations confirmed this result. Further analysis indicates that the pRMSD and rRMSD values are highly correlated (with an average correlation coefficient of 0.82) and the clusters obtained from pRMSD and rRMSD values are highly similar (the representative structures of the top five largest clusters from the two methods are 74% identical). SCUD (Structure ClUstering of Decoys) with an automatic cutoff value is available at http://theory.med.buffalo.edu. (c) 2005 Wiley Periodicals, Inc.

  11. Identification and growth characteristics of pink pigmented oxidative bacteria, Methylobacterium mesophilicum and biovars isolated from chlorinated and raw water supplies.

    PubMed

    O'Brien, J R; Murphy, J M

    1993-01-01

    Pink pigmented bacteria were isolated from a blood bank water purification unit, a municipal town water supply (tap water), and an island (untreated) ground water source. A total of thirteen strains including two reference strains of pink pigmented bacteria were compared in a numerical phenotypic study using 119 binary characters. Three clusters were derived, one major cluster of eleven strains was subdivided into two sub-clusters on the basis of methanol utilization. Five strains were facultative methylotrophs and were classified as Methylobacterium mesophilicum biovar 1. The other six strains did not utilize methanol, but on the basis of high phenotypic similarity of 83.6% were classified as M. mesophilicum biovar 2. The single reference strain comprising cluster 2 Pseudomonas extorquens NCIB 9399 was assigned to the genus Methylobacterium and classified as M. extorquens. Cluster 3 was the single reference strain Rhizobium CB 376.

  12. Comparison of methods for library construction and short read annotation of shellfish viral metagenomes.

    PubMed

    Wei, Hong-Ying; Huang, Sheng; Wang, Jiang-Yong; Gao, Fang; Jiang, Jing-Zhe

    2018-03-01

    The emergence and widespread use of high-throughput sequencing technologies have promoted metagenomic studies on environmental or animal samples. Library construction for metagenome sequencing and annotation of the produced sequence reads are important steps in such studies and influence the quality of metagenomic data. In this study, we collected some marine mollusk samples, such as Crassostrea hongkongensis, Chlamys farreri, and Ruditapes philippinarum, from coastal areas in South China. These samples were divided into two batches to compare two library construction methods for shellfish viral metagenome. Our analysis showed that reverse-transcribing RNA into cDNA and then amplifying it simultaneously with DNA by whole genome amplification (WGA) yielded a larger amount of DNA compared to using only WGA or WTA (whole transcriptome amplification). Moreover, higher quality libraries were obtained by agarose gel extraction rather than with AMPure bead size selection. However, the latter can also provide good results if combined with the adjustment of the filter parameters. This, together with its simplicity, makes it a viable alternative. Finally, we compared three annotation tools (BLAST, DIAMOND, and Taxonomer) and two reference databases (NCBI's NR and Uniprot's Uniref). Considering the limitations of computing resources and data transfer speed, we propose the use of DIAMOND with Uniref for annotating metagenomic short reads as its running speed can guarantee a good annotation rate. This study may serve as a useful reference for selecting methods for Shellfish viral metagenome library construction and read annotation.

  13. MultitaskProtDB: a database of multitasking proteins

    PubMed Central

    Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

    2014-01-01

    We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth. PMID:24253302

  14. E-Learning for Rare Diseases: An Example Using Fabry Disease.

    PubMed

    Cimmaruta, Chiara; Liguori, Ludovica; Monticelli, Maria; Andreotti, Giuseppina; Citro, Valentina

    2017-09-24

    Rare diseases represent a challenge for physicians because patients are rarely seen, and they can manifest with symptoms similar to those of common diseases. In this work, genetic confirmation of diagnosis is derived from DNA sequencing. We present a tutorial for the molecular analysis of a rare disease using Fabry disease as an example. An exonic sequence derived from a hypothetical male patient was matched against human reference data using a genome browser. The missense mutation was identified by running BlastX, and information on the affected protein was retrieved from the database UniProt. The pathogenic nature of the mutation was assessed with PolyPhen-2. Disease-specific databases were used to assess whether the missense mutation led to a severe phenotype, and whether pharmacological therapy was an option. An inexpensive bioinformatics approach is presented to get the reader acquainted with the diagnosis of Fabry disease. The reader is introduced to the field of pharmacological chaperones, a therapeutic approach that can be applied only to certain Fabry genotypes. The principle underlying the analysis of exome sequencing can be explained in simple terms using web applications and databases which facilitate diagnosis and therapeutic choices.

  15. TogoTable: cross-database annotation system using the Resource Description Framework (RDF) data model.

    PubMed

    Kawano, Shin; Watanabe, Tsutomu; Mizuguchi, Sohei; Araki, Norie; Katayama, Toshiaki; Yamaguchi, Atsuko

    2014-07-01

    TogoTable (http://togotable.dbcls.jp/) is a web tool that adds user-specified annotations to a table that a user uploads. Annotations are drawn from several biological databases that use the Resource Description Framework (RDF) data model. TogoTable uses database identifiers (IDs) in the table as a query key for searching. RDF data, which form a network called Linked Open Data (LOD), can be searched from SPARQL endpoints using a SPARQL query language. Because TogoTable uses RDF, it can integrate annotations from not only the reference database to which the IDs originally belong, but also externally linked databases via the LOD network. For example, annotations in the Protein Data Bank can be retrieved using GeneID through links provided by the UniProt RDF. Because RDF has been standardized by the World Wide Web Consortium, any database with annotations based on the RDF data model can be easily incorporated into this tool. We believe that TogoTable is a valuable Web tool, particularly for experimental biologists who need to process huge amounts of data such as high-throughput experimental output. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Database citation in full text biomedical articles.

    PubMed

    Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

    2013-01-01

    Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

  17. Database Citation in Full Text Biomedical Articles

    PubMed Central

    Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.

    2013-01-01

    Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176

  18. LocSigDB: a database of protein localization signals

    PubMed Central

    Negi, Simarjeet; Pandey, Sanjit; Srinivasan, Satish M.; Mohammed, Akram; Guda, Chittibabu

    2015-01-01

    LocSigDB (http://genome.unmc.edu/LocSigDB/) is a manually curated database of experimental protein localization signals for eight distinct subcellular locations; primarily in a eukaryotic cell with brief coverage of bacterial proteins. Proteins must be localized at their appropriate subcellular compartment to perform their desired function. Mislocalization of proteins to unintended locations is a causative factor for many human diseases; therefore, collection of known sorting signals will help support many important areas of biomedical research. By performing an extensive literature study, we compiled a collection of 533 experimentally determined localization signals, along with the proteins that harbor such signals. Each signal in the LocSigDB is annotated with its localization, source, PubMed references and is linked to the proteins in UniProt database along with the organism information that contain the same amino acid pattern as the given signal. From LocSigDB webserver, users can download the whole database or browse/search for data using an intuitive query interface. To date, LocSigDB is the most comprehensive compendium of protein localization signals for eight distinct subcellular locations. Database URL: http://genome.unmc.edu/LocSigDB/ PMID:25725059

  19. Hepatic SILAC proteomic data from PANDER transgenic model.

    PubMed

    Athanason, Mark G; Stevens, Stanley M; Burkhardt, Brant R

    2016-12-01

    This article contains raw and processed data related to research published in "Quantitative Proteomic Profiling Reveals Hepatic Lipogenesis and Liver X Receptor Activation in the PANDER Transgenic Model" (M.G. Athanason, W.A. Ratliff, D. Chaput, C.B. MarElia, M.N. Kuehl, S.M., Jr. Stevens, B.R. Burkhardt (2016)) [1], and was generated by "spike-in" SILAC-based proteomic analysis of livers obtained from the PANcreatic-Derived factor (PANDER) transgenic mouse (PANTG) under various metabolic conditions [1]. The mass spectrometry output of the PANTG and wild-type B6SJLF mice liver tissue and resulting proteome search from MaxQuant 1.2.2.5 employing the Andromeda search algorithm against the UniprotKB reference database for Mus musculus has been deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository with dataset identifiers PRIDE: PXD004171 and doi:10.6019/PXD004171. Protein ratio values representing PANTG/wild-type obtained by MaxQuant analysis were input into the Perseus processing suite to determine statistical significance using the Significance A outlier test (p<0.05). Differentially expressed proteins using this approach were input into Ingenuity Pathway Analysis to determined altered pathways and upstream regulators that were altered in PANTG mice.

  20. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  1. Estimation of satellite position, clock and phase bias corrections

    NASA Astrophysics Data System (ADS)

    Henkel, Patrick; Psychas, Dimitrios; Günther, Christoph; Hugentobler, Urs

    2018-05-01

    Precise point positioning with integer ambiguity resolution requires precise knowledge of satellite position, clock and phase bias corrections. In this paper, a method for the estimation of these parameters with a global network of reference stations is presented. The method processes uncombined and undifferenced measurements of an arbitrary number of frequencies such that the obtained satellite position, clock and bias corrections can be used for any type of differenced and/or combined measurements. We perform a clustering of reference stations. The clustering enables a common satellite visibility within each cluster and an efficient fixing of the double difference ambiguities within each cluster. Additionally, the double difference ambiguities between the reference stations of different clusters are fixed. We use an integer decorrelation for ambiguity fixing in dense global networks. The performance of the proposed method is analysed with both simulated Galileo measurements on E1 and E5a and real GPS measurements of the IGS network. We defined 16 clusters and obtained satellite position, clock and phase bias corrections with a precision of better than 2 cm.

  2. OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software.

    PubMed

    Kriventseva, Evgenia V; Tegenfeldt, Fredrik; Petty, Tom J; Waterhouse, Robert M; Simão, Felipe A; Pozdnyakov, Igor A; Ioannidis, Panagiotis; Zdobnov, Evgeny M

    2015-01-01

    Orthology, refining the concept of homology, is the cornerstone of evolutionary comparative studies. With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies. This update of the OrthoDB hierarchical catalog of orthologs (http://www.orthodb.org) covers 3027 complete genomes, including the most comprehensive set of 87 arthropods, 61 vertebrates, 227 fungi and 2627 bacteria (sampling the most complete and representative genomes from over 11,000 available). In addition to the most extensive integration of functional annotations from UniProt, InterPro, GO, OMIM, model organism phenotypes and COG functional categories, OrthoDB uniquely provides evolutionary annotations including rates of ortholog sequence divergence, copy-number profiles, sibling groups and gene architectures. We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy. The text searches allow use of complex logic with various identifiers of genes, proteins, domains, ontologies or annotation keywords and phrases. Gene copy-number profiles can also be queried. This release comes with the freely available underlying ortholog clustering pipeline (http://www.orthodb.org/software). © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures.

    PubMed

    Mazandu, Gaston K; Mulder, Nicola J

    2013-09-25

    The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis.

  4. Discontinuities-free complete-active-space state–specific multi–reference coupled cluster theory for describing bond stretching and dissociation

    DOE PAGES

    Zaporozhets, Irina A.; Ivanov, Vladimir V.; Lyakh, Dmitry I.; ...

    2015-07-13

    The earlier proposed multi-reference state-specific coupled-cluster theory with the complete active space reference suffered from a problem of energy discontinuities when the formal reference state was changing in the calculation of the potential energy curve (PEC). A simple remedy to the discontinuity problem is found and is presented in this work. It involves using natural complete active space self-consistent field active orbitals in the complete active space coupled-cluster calculations. As a result, the approach gives smooth PECs for different types of dissociation problems, as illustrated in the calculations of the dissociation of the single bond in the hydrogen fluorine moleculemore » and of the symmetric double-bond dissociation in the water molecule.« less

  5. Biochemical and structural characterization of CYP109A2, a vitamin D3 25-hydroxylase from Bacillus megaterium.

    PubMed

    Abdulmughni, Ammar; Jóźwik, Ilona K; Brill, Elisa; Hannemann, Frank; Thunnissen, Andy-Mark W H; Bernhardt, Rita

    2017-11-01

    Cytochrome P450 enzymes are increasingly investigated due to their potential application as biocatalysts with high regio- and/or stereo-selectivity and under mild conditions. Vitamin D 3 (VD 3 ) metabolites are of pharmaceutical importance and are applied for the treatment of VD 3 deficiency and other disorders. However, the chemical synthesis of VD 3 derivatives shows low specificity and low yields. In this study, cytochrome P450 CYP109A2 from Bacillus megaterium DSM319 was expressed, purified, and shown to oxidize VD 3 with high regio-selectivity. The in vitro conversion, using cytochrome P450 reductase (BmCPR) and ferredoxin (Fdx2) from the same strain, showed typical Michaelis-Menten reaction kinetics. A whole-cell system in B. megaterium overexpressing CYP109A2 reached 76 ± 5% conversion after 24 h and allowed to identify the main product by NMR analysis as 25-hydroxylated VD 3 . Product yield amounted to 54.9 mg·L -1 ·day -1 , rendering the established whole-cell system as a highly promising biocatalytic route for the production of this valuable metabolite. The crystal structure of substrate-free CYP109A2 was determined at 2.7 Å resolution, displaying an open conformation. Structural analysis predicts that CYP109A2 uses a highly similar set of residues for VD 3 binding as the related VD 3 hydroxylases CYP109E1 from B. megaterium and CYP107BR1 (Vdh) from Pseudonocardia autotrophica. However, the folds and sequences of the BC loops in these three P450s are highly divergent, leading to differences in the shape and apolar/polar surface distribution of their active site pockets, which may account for the observed differences in substrate specificity and the regio-selectivity of VD 3 hydroxylation. The atomic coordinates and structure factors have been deposited in the Protein Data Bank with accession code 5OFQ (substrate-free CYP109A2). Cytochrome P450 monooxygenase CYP109A2, EC 1.14.14.1, UniProt ID: D5DF88, Ferredoxin, UniProt ID: D5DFQ0, cytochrome P450 reductase, EC 1.8.1.2, UniProt ID: D5DGX1. © 2017 Federation of European Biochemical Societies.

  6. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences.

    PubMed

    Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory

    2014-01-01

    We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.

  7. SANSparallel: interactive homology search against Uniprot.

    PubMed

    Somervuo, Panu; Holm, Liisa

    2015-07-01

    Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Label-Free Quantitative Proteomic Analysis of Puccinia psidii Uredospores Reveals Differences of Fungal Populations Infecting Eucalyptus and Guava

    PubMed Central

    Bini, Andressa Peres; Regiani, Thais; Franceschini, Lívia Maria; Budzinski, Ilara Gabriela Frasson; Marques, Felipe Garbelini; Labate, Mônica Teresa Veneziano; Guidetti-Gonzalez, Simone; Moon, David Henry; Labate, Carlos Alberto

    2016-01-01

    Puccinia psidii sensu lato (s.l.) is the causal agent of eucalyptus and guava rust, but it also attacks a wide range of plant species from the myrtle family, resulting in a significant genetic and physiological variability among populations accessed from different hosts. The uredospores are crucial to P. psidii dissemination in the field. Although they are important for the fungal pathogenesis, their molecular characterization has been poorly studied. In this work, we report the first in-depth proteomic analysis of P. psidii s.l. uredospores from two contrasting populations: guava fruits (PpGuava) and eucalyptus leaves (PpEucalyptus). NanoUPLC-MSE was used to generate peptide spectra that were matched to the UniProt Puccinia genera sequences (UniProt database) resulting in the first proteomic analysis of the phytopathogenic fungus P. psidii. Three hundred and fourty proteins were detected and quantified using Label free proteomics. A significant number of unique proteins were found for each sample, others were significantly more or less abundant, according to the fungal populations. In PpGuava population, many proteins correlated with fungal virulence, such as malate dehydrogenase, proteossomes subunits, enolases and others were increased. On the other hand, PpEucalyptus proteins involved in biogenesis, protein folding and translocation were increased, supporting the physiological variability of the fungal populations according to their protein reservoirs and specific host interaction strategies. PMID:26731728

  9. Label-Free Quantitative Proteomic Analysis of Puccinia psidii Uredospores Reveals Differences of Fungal Populations Infecting Eucalyptus and Guava.

    PubMed

    Quecine, Maria Carolina; Leite, Thiago Falda; Bini, Andressa Peres; Regiani, Thais; Franceschini, Lívia Maria; Budzinski, Ilara Gabriela Frasson; Marques, Felipe Garbelini; Labate, Mônica Teresa Veneziano; Guidetti-Gonzalez, Simone; Moon, David Henry; Labate, Carlos Alberto

    2016-01-01

    Puccinia psidii sensu lato (s.l.) is the causal agent of eucalyptus and guava rust, but it also attacks a wide range of plant species from the myrtle family, resulting in a significant genetic and physiological variability among populations accessed from different hosts. The uredospores are crucial to P. psidii dissemination in the field. Although they are important for the fungal pathogenesis, their molecular characterization has been poorly studied. In this work, we report the first in-depth proteomic analysis of P. psidii s.l. uredospores from two contrasting populations: guava fruits (PpGuava) and eucalyptus leaves (PpEucalyptus). NanoUPLC-MSE was used to generate peptide spectra that were matched to the UniProt Puccinia genera sequences (UniProt database) resulting in the first proteomic analysis of the phytopathogenic fungus P. psidii. Three hundred and fourty proteins were detected and quantified using Label free proteomics. A significant number of unique proteins were found for each sample, others were significantly more or less abundant, according to the fungal populations. In PpGuava population, many proteins correlated with fungal virulence, such as malate dehydrogenase, proteossomes subunits, enolases and others were increased. On the other hand, PpEucalyptus proteins involved in biogenesis, protein folding and translocation were increased, supporting the physiological variability of the fungal populations according to their protein reservoirs and specific host interaction strategies.

  10. Recent advances in spin-free state-specific and state-universal multi-reference coupled cluster formalisms: A unitary group adapted approach

    NASA Astrophysics Data System (ADS)

    Maitra, Rahul; Sinha, Debalina; Sen, Sangita; Shee, Avijit; Mukherjee, Debashis

    2012-06-01

    We present here the formulations and implementations of Mukherjee's State-Specific and State-Universal Multi-reference Coupled Cluster theories, which are explicitly spin free being obtained via the Unitary Group Adapted (UGA) approach, and thus, do not suffer from spin-contamination. We refer to them as UGA-SSMRCC and UGASUMRCC respectively. We propose a new multi-exponential cluster Ansatz analogous to but different from the one suggested by Jeziorski and Monkhorst (JM). Unlike the JM Ansatz, our choice involves spin-free unitary generators for the cluster operators and we replace the traditional exponential structure for the wave-operator by a suitable normal ordered exponential. We sketch the consequences of choosing our Ansatz, which leads to fully spin-free finite power series structure of the direct term of the MRCC equations. The UGA-SUMRCC follows from a suitable hierarchical generation of the cluster amplitudes of increasing rank, while the UGA-SSMRCC requires suitable sufficiency conditions to arrive at a well-defined set of equations for the cluster amplitudes. We discuss two distinct and inequivalent sufficiency conditions and their pros and cons. We also discuss a variant of the UGA-SSMRCC, where the number of cluster amplitudes can be drastically reduced by internal contraction of the two-body inactive cluster amplitudes. These are the most numerous, and thus a spin-free internally contracted description will lead to a high speed-up factor. We refer to this as ICID-UGA-SSMRCC. Essentially the same mathematical manipulations provide us with the UGA-SUMRCC theory as well. Pilot numerical results are presented to indicate the promise and the efficacy of all the three methods.

  11. Summary Diagrams for Coupled Hydrodynamic-Ecosystem Model Skill Assessment

    DTIC Science & Technology

    2009-01-01

    reference point have the smallest unbiased RMSD value (Fig. 3). It would appear that the cluster of model points closest to the reference point may...total RMSD values. This is particularly the case for phyto- plankton absorption (Fig. 3B) where the cluster of points closest to the reference...pattern statistics and the bias (difference of mean values) each magnitude of the total Root-Mean-Square Difference ( RMSD ). An alternative skill score and

  12. Perturbative universal state-selective correction for state-specific multi-reference coupled cluster methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brabec, Jiri; Banik, Subrata; Kowalski, Karol

    2016-10-28

    The implementation details of the universal state-selective (USS) multi-reference coupled cluster (MRCC) formalism with singles and doubles (USS(2)) are discussed on the example of several benchmark systems. We demonstrate that the USS(2) formalism is capable of improving accuracies of state specific multi-reference coupled-cluster (MRCC) methods based on the Brillouin-Wigner and Mukherjee’s sufficiency conditions. Additionally, it is shown that the USS(2) approach significantly alleviates problems associated with the lack of invariance of MRCC theories upon the rotation of active orbitals. We also discuss the perturbative USS(2) formulations that significantly reduce numerical overhead of the full USS(2) method.

  13. Patterns of Dysmorphic Features in Schizophrenia

    PubMed Central

    Scutt, L.E.; Chow, E.W.C.; Weksberg, R.; Honer, W.G.; Bassett, Anne S.

    2011-01-01

    Congenital dysmorphic features are prevalent in schizophrenia and may reflect underlying neurodevelopmental abnormalities. A cluster analysis approach delineating patterns of dysmorphic features has been used in genetics to classify individuals into more etiologically homogeneous subgroups. In the present study, this approach was applied to schizophrenia, using a sample with a suspected genetic syndrome as a testable model. Subjects (n = 159) with schizophrenia or schizoaffective disorder were ascertained from chronic patient populations (random, n=123) or referred with possible 22q11 deletion syndrome (referred, n = 36). All subjects were evaluated for presence or absence of 70 reliably assessed dysmorphic features, which were used in a three-step cluster analysis. The analysis produced four major clusters with different patterns of dysmorphic features. Significant between-cluster differences were found for rates of 37 dysmorphic features (P < 0.05), median number of dysmorphic features (P = 0.0001), and validating features not used in the cluster analysis: mild mental retardation (P = 0.001) and congenital heart defects (P = 0.002). Two clusters (1 and 4) appeared to represent more developmental subgroups of schizophrenia with elevated rates of dysmorphic features and validating features. Cluster 1 (n = 27) comprised mostly referred subjects. Cluster 4 (n= 18) had a different pattern of dysmorphic features; one subject had a mosaic Turner syndrome variant. Two other clusters had lower rates and patterns of features consistent with those found in previous studies of schizophrenia. Delineating patterns of dysmorphic features may help identify subgroups that could represent neurodevelopmental forms of schizophrenia with more homogeneous origins. PMID:11803519

  14. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    DTIC Science & Technology

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  15. Genetics Home Reference: myopathy with deficiency of iron-sulfur cluster assembly enzyme

    MedlinePlus

    ... Myopathy with deficiency of iron-sulfur cluster assembly enzyme Printable PDF Open All Close All Enable Javascript ... Myopathy with deficiency of iron-sulfur cluster assembly enzyme is an inherited disorder that primarily affects muscles ...

  16. MitoMiner: a data warehouse for mitochondrial proteomics data

    PubMed Central

    Smith, Anthony C.; Blackshaw, James A.; Robinson, Alan J.

    2012-01-01

    MitoMiner (http://mitominer.mrc-mbu.cam.ac.uk/) is a data warehouse for the storage and analysis of mitochondrial proteomics data gathered from publications of mass spectrometry and green fluorescent protein tagging studies. In MitoMiner, these data are integrated with data from UniProt, Gene Ontology, Online Mendelian Inheritance in Man, HomoloGene, Kyoto Encyclopaedia of Genes and Genomes and PubMed. The latest release of MitoMiner stores proteomics data sets from 46 studies covering 11 different species from eumetazoa, viridiplantae, fungi and protista. MitoMiner is implemented by using the open source InterMine data warehouse system, which provides a user interface allowing users to upload data for analysis, personal accounts to store queries and results and enables queries of any data in the data model. MitoMiner also provides lists of proteins for use in analyses, including the new MitoMiner mitochondrial proteome reference sets that specify proteins with substantial experimental evidence for mitochondrial localization. As further mitochondrial proteomics data sets from normal and diseased tissue are published, MitoMiner can be used to characterize the variability of the mitochondrial proteome between tissues and investigate how changes in the proteome may contribute to mitochondrial dysfunction and mitochondrial-associated diseases such as cancer, neurodegenerative diseases, obesity, diabetes, heart failure and the ageing process. PMID:22121219

  17. Single nucleotide polymorphism discovery via genotyping by sequencing to assess population genetic structure and recurrent polyploidization in Andropogon gerardii.

    PubMed

    McAllister, Christine A; Miller, Allison J

    2016-07-01

    Autopolyploidy, genome duplication within a single lineage, can result in multiple cytotypes within a species. Geographic distributions of cytotypes may reflect the evolutionary history of autopolyploid formation and subsequent population dynamics including stochastic (drift) and deterministic (differential selection among cytotypes) processes. Here, we used a population genomic approach to investigate whether autopolyploidy occurred once or multiple times in Andropogon gerardii, a widespread, North American grass with two predominant cytotypes. Genotyping by sequencing was used to identify single nucleotide polymorphisms (SNPs) in individuals collected from across the geographic range of A. gerardii. Two independent approaches to SNP calling were used: the reference-free UNEAK pipeline and a reference-guided approach based on the sequenced Sorghum bicolor genome. SNPs generated using these pipelines were analyzed independently with genetic distance and clustering. Analyses of the two SNP data sets showed very similar patterns of population-level clustering of A. gerardii individuals: a cluster of A. gerardii individuals from the southern Plains, a northern Plains cluster, and a western cluster. Groupings of individuals corresponded to geographic localities regardless of cytotype: 6x and 9x individuals from the same geographic area clustered together. SNPs generated using reference-guided and reference-free pipelines in A. gerardii yielded unique subsets of genomic data. Both data sets suggest that the 9x cytotype in A. gerardii likely evolved multiple times from 6x progenitors across the range of the species. Genomic approaches like GBS and diverse bioinformatics pipelines used here facilitate evolutionary analyses of complex systems with multiple ploidy levels. © 2016 Botanical Society of America.

  18. A quasiparticle-based multi-reference coupled-cluster method.

    PubMed

    Rolik, Zoltán; Kállay, Mihály

    2014-10-07

    The purpose of this paper is to introduce a quasiparticle-based multi-reference coupled-cluster (MRCC) approach. The quasiparticles are introduced via a unitary transformation which allows us to represent a complete active space reference function and other elements of an orthonormal multi-reference (MR) basis in a determinant-like form. The quasiparticle creation and annihilation operators satisfy the fermion anti-commutation relations. On the basis of these quasiparticles, a generalization of the normal-ordered operator products for the MR case can be introduced as an alternative to the approach of Mukherjee and Kutzelnigg [Recent Prog. Many-Body Theor. 4, 127 (1995); Mukherjee and Kutzelnigg, J. Chem. Phys. 107, 432 (1997)]. Based on the new normal ordering any quasiparticle-based theory can be formulated using the well-known diagram techniques. Beyond the general quasiparticle framework we also present a possible realization of the unitary transformation. The suggested transformation has an exponential form where the parameters, holding exclusively active indices, are defined in a form similar to the wave operator of the unitary coupled-cluster approach. The definition of our quasiparticle-based MRCC approach strictly follows the form of the single-reference coupled-cluster method and retains several of its beneficial properties. Test results for small systems are presented using a pilot implementation of the new approach and compared to those obtained by other MR methods.

  19. 7 CFR 52.1842 - Product description of Layer or (Cluster) raisins with seeds.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Product description of Layer or (Cluster) raisins with... Raisins 1 § 52.1842 Product description of Layer or (Cluster) raisins with seeds. Raisins with Seeds that are referred to as Layer or Cluster raisins means that the raisins have not been detached from the...

  20. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures

    PubMed Central

    2013-01-01

    Background The use of Gene Ontology (GO) data in protein analyses have largely contributed to the improved outcomes of these analyses. Several GO semantic similarity measures have been proposed in recent years and provide tools that allow the integration of biological knowledge embedded in the GO structure into different biological analyses. There is a need for a unified tool that provides the scientific community with the opportunity to explore these different GO similarity measure approaches and their biological applications. Results We have developed DaGO-Fun, an online tool available at http://web.cbio.uct.ac.za/ITGOM, which incorporates many different GO similarity measures for exploring, analyzing and comparing GO terms and proteins within the context of GO. It uses GO data and UniProt proteins with their GO annotations as provided by the Gene Ontology Annotation (GOA) project to precompute GO term information content (IC), enabling rapid response to user queries. Conclusions The DaGO-Fun online tool presents the advantage of integrating all the relevant IC-based GO similarity measures, including topology- and annotation-based approaches to facilitate effective exploration of these measures, thus enabling users to choose the most relevant approach for their application. Furthermore, this tool includes several biological applications related to GO semantic similarity scores, including the retrieval of genes based on their GO annotations, the clustering of functionally related genes within a set, and term enrichment analysis. PMID:24067102

  1. dbSUPER: a database of super-enhancers in mouse and human genome

    PubMed Central

    Khan, Aziz; Zhang, Xuegong

    2016-01-01

    Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538

  2. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.

    PubMed

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-10-18

    Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.

  3. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases

    PubMed Central

    Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning

    2007-01-01

    Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017

  4. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

    PubMed

    He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

    2015-01-01

    The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

  5. Combinations of Personal Responsibility: Differences on Pre-service and Practicing Teachers’ Efficacy, Engagement, Classroom Goal Structures and Wellbeing

    PubMed Central

    Daniels, Lia M.; Radil, Amanda I.; Goegan, Lauren D.

    2017-01-01

    Pre-service and practicing teachers feel responsible for a range of educational activities. Four domains of personal responsibility emerging in the literature are: student achievement, student motivation, relationships with students, and responsibility for ones own teaching. To date, most research has used variable-centered approaches to examining responsibilities even though the domains appear related. In two separate samples we used cluster analysis to explore how pre-service (n = 130) and practicing (n = 105) teachers combined personal responsibilities and their impact on three professional cognitions and their wellbeing. Both groups had low and high responsibility clusters but the third cluster differed: Pre-service teachers combined responsibilities for relationships and their own teaching in a cluster we refer to as teacher-based responsibility; whereas, practicing teachers combined achievement and motivation in a cluster we refer to as student-outcome focused responsibility. These combinations affected outcomes for pre-service but not practicing teachers. Pre-service teachers in the low responsibility cluster reported less engagement, less mastery approaches to instruction, and more performance goal structures than the other two clusters. PMID:28620332

  6. Combinations of Personal Responsibility: Differences on Pre-service and Practicing Teachers' Efficacy, Engagement, Classroom Goal Structures and Wellbeing.

    PubMed

    Daniels, Lia M; Radil, Amanda I; Goegan, Lauren D

    2017-01-01

    Pre-service and practicing teachers feel responsible for a range of educational activities. Four domains of personal responsibility emerging in the literature are: student achievement, student motivation, relationships with students, and responsibility for ones own teaching. To date, most research has used variable-centered approaches to examining responsibilities even though the domains appear related. In two separate samples we used cluster analysis to explore how pre-service ( n = 130) and practicing ( n = 105) teachers combined personal responsibilities and their impact on three professional cognitions and their wellbeing. Both groups had low and high responsibility clusters but the third cluster differed: Pre-service teachers combined responsibilities for relationships and their own teaching in a cluster we refer to as teacher-based responsibility; whereas, practicing teachers combined achievement and motivation in a cluster we refer to as student-outcome focused responsibility. These combinations affected outcomes for pre-service but not practicing teachers. Pre-service teachers in the low responsibility cluster reported less engagement, less mastery approaches to instruction, and more performance goal structures than the other two clusters.

  7. Catalog of open clusters and associated interstellar matter

    NASA Technical Reports Server (NTRS)

    Leisawitz, David

    1988-01-01

    The Catalog of Open Clusters and Associated Interstellar Matter summarizes observations of 128 open clusters and their associated ionized, atomic, and molecular iinterstellar matter. Cluster sizes, distances, radial velocities, ages, and masses, and the radial velocities and masses of associated interstellar medium components, are given. The database contains information from approximately 400 references published in the scientific literature before 1988.

  8. On Ion Clusters in the Interstellar Gas

    NASA Technical Reports Server (NTRS)

    Donn, Bertram

    1960-01-01

    In a recent paper V.I. Krassovsky (1958) predicts the occurrence of clusters of large numbers of atoms and molecules around ions in the interstellar gas. He then proposes a number of physicochemical processes that would be considerably enhanced by the high particle density in such clusters. In particular, he suggests that absorption by negative ions formed in the clusters would account for the interstellar extinction without any necessity for the presence of grains. Because of the important consequences that ion clusters could have, it is necessary to examine their occurrence more fully. This note re-examines the formation of ion clusters in space and shows that even ion-molecule pairs are essentially non-existent. Ion clusters have been considered by Bloom and Margenau (1952) from the same point of view as that used by Krassovsky, whose basic reference (Joffe and Semenov 1933) unfortunately is not available. A different approach has been used by Eyring, Hirschfelder, and Taylor (1936) following the methods of chemical equilibrium. Both the references cited here enable one to conclude that clustering is negligible. Therefore, the treatment of Eyring et al. is more appropriate than the method of Bloom and Margenau, which depends on the statistical equilibrium of an atmosphere in a force field.

  9. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  10. A PRIOR EVALUATION OF TWO-STAGE CLUSTER SAMPLING FOR ACCURACY ASSESSMENT OF LARGE-AREA LAND-COVER MAPS

    EPA Science Inventory

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, withi...

  11. Assessing the distinguishable cluster approximation based on the triple bond-breaking in the nitrogen molecule

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rishi, Varun; Perera, Ajith; Bartlett, Rodney J., E-mail: bartlett@qtp.ufl.edu

    2016-03-28

    Obtaining the correct potential energy curves for the dissociation of multiple bonds is a challenging problem for ab initio methods which are affected by the choice of a spin-restricted reference function. Coupled cluster (CC) methods such as CCSD (coupled cluster singles and doubles model) and CCSD(T) (CCSD + perturbative triples) correctly predict the geometry and properties at equilibrium but the process of bond dissociation, particularly when more than one bond is simultaneously broken, is much more complicated. New modifications of CC theory suggest that the deleterious role of the reference function can be diminished, provided a particular subset of termsmore » is retained in the CC equations. The Distinguishable Cluster (DC) approach of Kats and Manby [J. Chem. Phys. 139, 021102 (2013)], seemingly overcomes the deficiencies for some bond-dissociation problems and might be of use in quasi-degenerate situations in general. DC along with other approximate coupled cluster methods such as ACCD (approximate coupled cluster doubles), ACP-D45, ACP-D14, 2CC, and pCCSD(α, β) (all defined in text) falls under a category of methods that are basically obtained by the deletion of some quadratic terms in the double excitation amplitude equation for CCD/CCSD (coupled cluster doubles model/coupled cluster singles and doubles model). Here these approximate methods, particularly those based on the DC approach, are studied in detail for the nitrogen molecule bond-breaking. The N{sub 2} problem is further addressed with conventional single reference methods but based on spatial symmetry-broken restricted Hartree–Fock (HF) solutions to assess the use of these references for correlated calculations in the situation where CC methods using fully symmetry adapted SCF solutions fail. The distinguishable cluster method is generalized: 1) to different orbitals for different spins (unrestricted HF based DCD and DCSD), 2) by adding triples correction perturbatively (DCSD(T)) and iteratively (DCSDT-n), and 3) via an excited state approximation through the equation of motion (EOM) approach (EOM-DCD, EOM-DCSD). The EOM-CC method is used to identify lower-energy CC solutions to overcome singularities in the CC potential energy curves. It is also shown that UHF based CC and DC methods behave very similarly in bond-breaking of N{sub 2}, and that using spatially broken but spin preserving SCF references makes the CCSD solutions better than those for DCSD.« less

  12. Cluster Analysis of Minnesota School Districts. A Research Report.

    ERIC Educational Resources Information Center

    Cleary, James

    The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…

  13. Temperamental reactivity and negative emotionality in uncooperative children referred to specialized paediatric dentistry compared to children in ordinary dental care.

    PubMed

    Arnrup, Kristina; Broberg, Anders G; Berggren, Ulf; Bodin, Lennart

    2007-11-01

    Current treatment of children with dental behaviour management problems (DBMP) is based on the presupposition that their difficulties are caused by dental fear, but is this always the case? The aim of this study was to study temperamental reactivity, negative emotionality, and other personal characteristics in relation to DBMP in 8- to 12-year-old children. Forty-six children referred because of DBMP (study group) and 110 children in ordinary dental care (reference group) participated. The EASI tempramental survey assessed temperamental reactivity and negative emotionality, the Child Behaviour Questionnaire internalizing and externalizing behaviour problems, and the Children's Fear Survey Schedule general and dental fears. Cluster analyses and tree-based modelling were used for data analysis. Among the five clusters identified, one could be characterized as 'balanced temperament'. Thirty-five per cent of the reference group compared to only 7% of the study group belonged to this cluster. Negative emotionality was the most important sorting variable. Children referred because of DBMP differed from children in ordinary dental care, not only in dental fear level, but also in personal characteristics. Few of the referred children were characterized by a balanced temperament profile. It is important to consider the dual impact of emotion dysregulation and emotional reactivity in the development of DBMP.

  14. Cluster Beam Deposition Technology for Optical Coatings. Phase 1

    DTIC Science & Technology

    1987-05-01

    Particles 55 5.4.3 Growth of Ultrafine Particles or 61 Clusters by Gas Quenching 6.0 REFERENCES 67 APPENDIX: SUPPLEMENTARY INFORMATION 69 COP TR-407/5-87...approach, based on growth and transport of ultrafine particles or clusters in a quenching gas, appears more promising in our view and has been proposed for... Ultrafine Particles or Clusters by Gas quenching The apparent difficulty of making metal clusters with a Takagi-type source led us to explore other

  15. Psychological Factors Predict Local and Referred Experimental Muscle Pain: A Cluster Analysis in Healthy Adults

    PubMed Central

    Lee, Jennifer E.; Watson, David; Frey-Law, Laura A.

    2012-01-01

    Background Recent studies suggest an underlying three- or four-factor structure explains the conceptual overlap and distinctiveness of several negative emotionality and pain-related constructs. However, the validity of these latent factors for predicting pain has not been examined. Methods A cohort of 189 (99F; 90M) healthy volunteers completed eight self-report negative emotionality and pain-related measures (Eysenck Personality Questionnaire-Revised; Positive and Negative Affect Schedule; State-Trait Anxiety Inventory; Pain Catastrophizing Scale; Fear of Pain Questionnaire; Somatosensory Amplification Scale; Anxiety Sensitivity Index; Whiteley Index). Using principal axis factoring, three primary latent factors were extracted: General Distress; Catastrophic Thinking; and Pain-Related Fear. Using these factors, individuals clustered into three subgroups of high, moderate, and low negative emotionality responses. Experimental pain was induced via intramuscular acidic infusion into the anterior tibialis muscle, producing local (infusion site) and/or referred (anterior ankle) pain and hyperalgesia. Results Pain outcomes differed between clusters (multivariate analysis of variance and multinomial regression), with individuals in the highest negative emotionality cluster reporting the greatest local pain (p = 0.05), mechanical hyperalgesia (pressure pain thresholds; p = 0.009) and greater odds (2.21 OR) of experiencing referred pain compared to the lowest negative emotionality cluster. Conclusion Our results provide support for three latent psychological factors explaining the majority of the variance between several pain-related psychological measures, and that individuals in the high negative emotionality subgroup are at increased risk for (1) acute local muscle pain; (2) local hyperalgesia; and (3) referred pain using a standardized nociceptive input. PMID:23165778

  16. Map-based trigonometric parallaxes of open clusters - The Pleiades

    NASA Technical Reports Server (NTRS)

    Gatewood, George; Castelaz, Michael; Han, Inwoo; Persinger, Timothy; Stein, John

    1990-01-01

    The multichannel astrometric photometer and Thaw refractor of the University of Pittsburgh's Allegheny Observatory have been used to determine the trigonometric parallax of the Pleiades star cluster. The distance determined, 150 with a standard error of 18 parsecs, places the cluster slightly farther away than generally accepted. This suggests that the basis of many estimations of the cosmic distance scale is approximately 20 percent short. The accuracy of the determination is limited by the number and choice of reference stars. With careful attention to the selection of reference stars in several Pleiades regions, it should be possible to examine differences in the photometric and trigonometric modulus at a precision of 0.1 magnitudes.

  17. Dynamical evolution of globular-cluster systems in clusters of galaxies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muzzio, J.C.

    1987-04-01

    The dynamical processes that affect globular-cluster systems in clusters of galaxies are analyzed. Two-body and impulsive approximations are utilized to study dynamical friction, drag force, tidal stripping, tidal radii, globular-cluster swapping, tidal accretion, and galactic cannibalism. The evolution of galaxies and the collision of galaxies are simulated numerically; the steps involved in the simulation are described. The simulated data are compared with observations. Consideration is given to the number of galaxies, halo extension, location of the galaxies, distribution of the missing mass, nonequilibrium initial conditions, mass dependence, massive central galaxies, globular-cluster distribution, and lost globular clusters. 116 references.

  18. MultitaskProtDB-II: an update of a database of multitasking/moonlighting proteins

    PubMed Central

    Franco-Serrano, Luís; Hernández, Sergio; Calvo, Alejandra; Severi, María A; Ferragut, Gabriela; Pérez-Pons, JosepAntoni; Piñol, Jaume; Pich, Òscar; Mozo-Villarias, Ángel; Amela, Isaac

    2018-01-01

    Abstract Multitasking, or moonlighting, is the capability of some proteins to execute two or more biological functions. MultitaskProtDB-II is a database of multifunctional proteins that has been updated. In the previous version, the information contained was: NCBI and UniProt accession numbers, canonical and additional biological functions, organism, monomeric/oligomeric states, PDB codes and bibliographic references. In the present update, the number of entries has been increased from 288 to 694 moonlighting proteins. MultitaskProtDB-II is continually being curated and updated. The new database also contains the following information: GO descriptors for the canonical and moonlighting functions, three-dimensional structure (for those proteins lacking PDB structure, a model was made using Itasser and Phyre), the involvement of the proteins in human diseases (78% of human moonlighting proteins) and whether the protein is a target of a current drug (48% of human moonlighting proteins). These numbers highlight the importance of these proteins for the analysis and explanation of human diseases and target-directed drug design. Moreover, 25% of the proteins of the database are involved in virulence of pathogenic microorganisms, largely in the mechanism of adhesion to the host. This highlights their importance for the mechanism of microorganism infection and vaccine design. MultitaskProtDB-II is available at http://wallace.uab.es/multitaskII. PMID:29136215

  19. Consolidation of proteomics data in the Cancer Proteomics database.

    PubMed

    Arntzen, Magnus Ø; Boddie, Paul; Frick, Rahel; Koehler, Christian J; Thiede, Bernd

    2015-11-01

    Cancer is a class of diseases characterized by abnormal cell growth and one of the major reasons for human deaths. Proteins are involved in the molecular mechanisms leading to cancer, furthermore they are affected by anti-cancer drugs, and protein biomarkers can be used to diagnose certain cancer types. Therefore, it is important to explore the proteomics background of cancer. In this report, we developed the Cancer Proteomics database to re-interrogate published proteome studies investigating cancer. The database is divided in three sections related to cancer processes, cancer types, and anti-cancer drugs. Currently, the Cancer Proteomics database contains 9778 entries of 4118 proteins extracted from 143 scientific articles covering all three sections: cell death (cancer process), prostate cancer (cancer type) and platinum-based anti-cancer drugs including carboplatin, cisplatin, and oxaliplatin (anti-cancer drugs). The detailed information extracted from the literature includes basic information about the articles (e.g., PubMed ID, authors, journal name, publication year), information about the samples (type, study/reference, prognosis factor), and the proteomics workflow (Subcellular fractionation, protein, and peptide separation, mass spectrometry, quantification). Useful annotations such as hyperlinks to UniProt and PubMed were included. In addition, many filtering options were established as well as export functions. The database is freely available at http://cancerproteomics.uio.no. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Evaluating the quality of Marfan genotype-phenotype correlations in existing FBN1 databases.

    PubMed

    Groth, Kristian A; Von Kodolitsch, Yskert; Kutsche, Kerstin; Gaustadnes, Mette; Thorsen, Kasper; Andersen, Niels H; Gravholt, Claus H

    2017-07-01

    Genetic FBN1 testing is pivotal for confirming the clinical diagnosis of Marfan syndrome. In an effort to evaluate variant causality, FBN1 databases are often used. We evaluated the current databases regarding FBN1 variants and validated associated phenotype records with a new Marfan syndrome geno-phenotyping tool called the Marfan score. We evaluated four databases (UMD-FBN1, ClinVar, the Human Gene Mutation Database (HGMD), and Uniprot) containing 2,250 FBN1 variants supported by 4,904 records presented in 307 references. The Marfan score calculated for phenotype data from the records quantified variant associations with Marfan syndrome phenotype. We calculated a Marfan score for 1,283 variants, of which we confirmed the database diagnosis of Marfan syndrome in 77.1%. This represented only 35.8% of the total registered variants; 18.5-33.3% (UMD-FBN1 versus HGMD) of variants associated with Marfan syndrome in the databases could not be confirmed by the recorded phenotype. FBN1 databases can be imprecise and incomplete. Data should be used with caution when evaluating FBN1 variants. At present, the UMD-FBN1 database seems to be the biggest and best curated; therefore, it is the most comprehensive database. However, the need for better genotype-phenotype curated databases is evident, and we hereby present such a database.Genet Med advance online publication 01 December 2016.

  1. Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation.

    PubMed

    Ruffier, Magali; Kähäri, Andreas; Komorowska, Monika; Keenan, Stephen; Laird, Matthew; Longden, Ian; Proctor, Glenn; Searle, Steve; Staines, Daniel; Taylor, Kieron; Vullo, Alessandro; Yates, Andrew; Zerbino, Daniel; Flicek, Paul

    2017-01-01

    The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). http://www.ensembl.org. © The Author(s) 2017. Published by Oxford University Press.

  2. Data file of a deep proteome analysis of the prefrontal cortex in aged mice with progranulin deficiency or neuronal overexpression of progranulin.

    PubMed

    Heidler, Juliana; Hardt, Stefanie; Wittig, Ilka; Tegeder, Irmgard

    2016-12-01

    Progranulin deficiency is associated with neurodegeneration in humans and in mice. The mechanisms likely involve progranulin-promoted removal of protein waste via autophagy. We performed a deep proteomic screen of the pre-frontal cortex in aged (13-15 months) female progranulin-deficient mice (GRN -/- ) and mice with inducible neuron-specific overexpression of progranulin (SLICK-GRN-OE) versus the respective control mice. Proteins were extracted and analyzed per liquid chromatography/mass spectrometry (LC/MS) on a Thermo Scientific™ Q Exactive Plus equipped with an ultra-high performance liquid chromatography unit and a Nanospray Flex Ion-Source. Full Scan MS-data were acquired using Xcalibur and raw files were analyzed using the proteomics software Max Quant. The mouse reference proteome set from uniprot (June 2015) was used to identify peptides and proteins. The DiB data file is a reduced MaxQuant output and includes peptide and protein identification, accession numbers, protein and gene names, sequence coverage and label free quantification (LFQ) values of each sample. Differences in protein expression in genotypes are presented in "Progranulin overexpression in sensory neurons attenuates neuropathic pain in mice: Role of autophagy" (C. Altmann, S. Hardt, C. Fischer, J. Heidler, H.Y. Lim, A. Haussler, B. Albuquerque, B. Zimmer, C. Moser, C. Behrends, F. Koentgen, I. Wittig, M.H. Schmidt, A.M. Clement, T. Deller, I. Tegeder, 2016) [1].

  3. Confirmation of translatability and functionality certifies the dual endothelin1/VEGFsp receptor (DEspR) protein.

    PubMed

    Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson

    2016-06-14

    In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.

  4. EnzML: multi-label prediction of enzyme classes using InterPro signatures

    PubMed Central

    2012-01-01

    Background Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function. Results We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters. Conclusions InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN). PMID:22533924

  5. Proteomic Analysis of Matched Formalin-Fixed, Paraffin-Embedded Specimens in Patients with Advanced Serous Ovarian Carcinoma

    PubMed Central

    Smith, Ashlee L.; Sun, Mai; Bhargava, Rohit; Stewart, Nicolas A.; Flint, Melanie S.; Bigbee, William L.; Krivak, Thomas C.; Strange, Mary A.; Cooper, Kristine L.; Zorn, Kristin K.

    2013-01-01

    Objective: The biology of high grade serous ovarian carcinoma (HGSOC) is poorly understood. Little has been reported on intratumoral homogeneity or heterogeneity of primary HGSOC tumors and their metastases. We evaluated the global protein expression profiles of paired primary and metastatic HGSOC from formalin-fixed, paraffin-embedded (FFPE) tissue samples. Methods: After IRB approval, six patients with advanced HGSOC were identified with tumor in both ovaries at initial surgery. Laser capture microdissection (LCM) was used to extract tumor for protein digestion. Peptides were extracted and analyzed by reversed-phase liquid chromatography coupled to a linear ion trap mass spectrometer. Tandem mass spectra were searched against the UniProt human protein database. Differences in protein abundance between samples were assessed and analyzed by Ingenuity Pathway Analysis software. Immunohistochemistry (IHC) for select proteins from the original and an additional validation set of five patients was performed. Results: Unsupervised clustering of the abundance profiles placed the paired specimens adjacent to each other. IHC H-score analysis of the validation set revealed a strong correlation between paired samples for all proteins. For the similarly expressed proteins, the estimated correlation coefficients in two of three experimental samples and all validation samples were statistically significant (p < 0.05). The estimated correlation coefficients in the experimental sample proteins classified as differentially expressed were not statistically significant. Conclusion: A global proteomic screen of primary HGSOC tumors and their metastatic lesions identifies tumoral homogeneity and heterogeneity and provides preliminary insight into these protein profiles and the cellular pathways they constitute. PMID:28250404

  6. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    PubMed

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  7. Sensory analysis of characterising flavours: evaluating tobacco product odours using an expert panel.

    PubMed

    Krüsemann, Erna J Z; Lasschuijt, Marlou P; de Graaf, C; de Wijk, René A; Punter, Pieter H; van Tiel, Loes; Cremers, Johannes W J M; van de Nobelen, Suzanne; Boesveldt, Sanne; Talhout, Reinskje

    2018-05-23

    Tobacco flavours are an important regulatory concept in several jurisdictions, for example in the USA, Canada and Europe. The European Tobacco Products Directive 2014/40/EU prohibits cigarettes and roll-your-own tobacco having a characterising flavour. This directive defines characterising flavour as 'a clearly noticeable smell or taste other than one of tobacco […]'. To distinguish between products with and without a characterising flavour, we trained an expert panel to identify characterising flavours by smelling. An expert panel (n=18) evaluated the smell of 20 tobacco products using self-defined odour attributes, following Quantitative Descriptive Analysis. The panel was trained during 14 attribute training, consensus training and performance monitoring sessions. Products were assessed during six test sessions. Principal component analysis, hierarchical clustering (four and six clusters) and Hotelling's T-tests (95% and 99% CIs) were used to determine differences and similarities between tobacco products based on odour attributes. The final attribute list contained 13 odour descriptors. Panel performance was sufficient after 14 training sessions. Products marketed as unflavoured that formed a cluster were considered reference products. A four-cluster method distinguished cherry-flavoured, vanilla-flavoured and menthol-flavoured products from reference products. Six clusters subdivided reference products into tobacco leaves, roll-your-own and commercial products. An expert panel was successfully trained to assess characterising odours in cigarettes and roll-your-own tobacco. This method could be applied to other product types such as e-cigarettes. Regulatory decisions on the choice of reference products and significance level are needed which directly influences the products being assessed as having a characterising odour. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  8. Novel approach to characterising individuals with low back-related leg pain: cluster identification with latent class analysis and 12-month follow-up.

    PubMed

    Stynes, Siobhán; Konstantinou, Kika; Ogollah, Reuben; Hay, Elaine M; Dunn, Kate M

    2018-04-01

    Traditionally, low back-related leg pain (LBLP) is diagnosed clinically as referred leg pain or sciatica (nerve root involvement). However, within the spectrum of LBLP, we hypothesised that there may be other unrecognised patient subgroups. This study aimed to identify clusters of patients with LBLP using latent class analysis and describe their clinical course. The study population was 609 LBLP primary care consulters. Variables from clinical assessment were included in the latent class analysis. Characteristics of the statistically identified clusters were compared, and their clinical course over 1 year was described. A 5 cluster solution was optimal. Cluster 1 (n = 104) had mild leg pain severity and was considered to represent a referred leg pain group with no clinical signs, suggesting nerve root involvement (sciatica). Cluster 2 (n = 122), cluster 3 (n = 188), and cluster 4 (n = 69) had mild, moderate, and severe pain and disability, respectively, and response to clinical assessment items suggested categories of mild, moderate, and severe sciatica. Cluster 5 (n = 126) had high pain and disability, longer pain duration, and more comorbidities and was difficult to map to a clinical diagnosis. Most improvement for pain and disability was seen in the first 4 months for all clusters. At 12 months, the proportion of patients reporting recovery ranged from 27% for cluster 5 to 45% for cluster 2 (mild sciatica). This is the first study that empirically shows the variability in profile and clinical course of patients with LBLP including sciatica. More homogenous groups were identified, which could be considered in future clinical and research settings.

  9. Comment on "An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System"

    ERIC Educational Resources Information Center

    Salton, G.

    1972-01-01

    The author emphasized that one cannot conclude from the experiments reported upon that term clusters (or equivalently, keyword classifications or thesauruses) are not useful in retrieval. (2 references) (Author)

  10. A Technique of Two-Stage Clustering Applied to Environmental and Civil Engineering and Related Methods of Citation Analysis.

    ERIC Educational Resources Information Center

    Miyamoto, S.; Nakayama, K.

    1983-01-01

    A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…

  11. Structure, stability, and properties of the trans peroxo nitrate radical: the importance of nondynamic correlation.

    PubMed

    Dutta, Achintya Kumar; Dar, Manzoor; Vaval, Nayana; Pal, Sourav

    2014-02-27

    We report a comparative single-reference and multireference coupled-cluster investigation on the structure, potential energy surface, and IR spectroscopic properties of the trans peroxo nitrate radical, one of the key intermediates in stratospheric NOX chemistry. The previous single-reference ab initio studies predicted an unbound structure for the trans peroxo nitrate radical. However, our Fock space multireference coupled-cluster calculation confirms a bound structure for the trans peroxo nitrate radical, in accordance with the experimental results reported earlier. Further, the analysis of the potential energy surface in FSMRCC method indicates a well-behaved minima, contrary to the shallow minima predicted by the single-reference coupled-cluster method. The harmonic force field analysis, of various possible isomers of peroxo nitrate also reveals that only the trans structure leads to the experimentally observed IR peak at 1840 cm(-1). The present study highlights the critical importance of nondynamic correlation in predicting the structure and properties of high-energy stratospheric NOx radicals.

  12. Genetic characterization of Vibrio vulnificus strains isolated from oyster samples in Mexico.

    PubMed

    Guerrero, Abraham; Gómez Gil Rodríguez, Bruno; Wong-Chang, Irma; Lizárraga-Partida, Marcial Leonardo

    2015-01-01

    Vibrio vulnificus strains were isolated from oysters that were collected at the main seafood market in Mexico City. Strains were characterized with regard to vvhA, vcg genotype, PFGE, multilocus sequence typing (MLST), and rtxA1. Analyses included a comparison with rtxA1 reference sequences. Environmental (vcgE) and clinical (vcgC) genotypes were isolated at nearly equal percentages. PFGE had high heterogeneity, but the strains clustered by vcgE or vcgC genotype. Select housekeeping genes for MLST and primers that were designed for rtxA1 domains divided the strains into two clusters according to the E or C genotype. Reference rtxA1 sequences and those from this study were also clustered according to genotype. These results confirm that this genetic dimorphism is not limited to vcg genotyping, as other studies have reported. Some environmental C genotype strains had high similarity to reference strains, which have been reported to be virulent, indicating a potential risk for oyster consumers in Mexico City.

  13. The effect of the subprime crisis on the credit risk in global scale

    NASA Astrophysics Data System (ADS)

    Lee, Sangwook; Kim, Min Jae; Lee, Sun Young; Kim, Soo Yong; Ban, Joon Hwa

    2013-05-01

    Credit default swap (CDS) has become one of the most actively traded credit derivatives, and its importance in finance markets has increased after the subprime crisis. In this study, we analyzed the correlation structure of credit risks embedded in CDS and the influence of the subprime crisis on this topological space. We found that the correlation was stronger in the cluster constructed according to the location of the CDS reference companies than in the one constructed according to their industries. The correlation both within a given cluster and between different clusters became significantly stronger after the subprime crisis. The causality test shows that the lead lag effect between the portfolios (into which reference companies are grouped by the continent where each of them is located) is reversed in direction because the portion of non-investable and investable reference companies in each portfolio has changed since then. The effect of a single impulse has increased and the response time relaxation has become prolonged after the crisis as well.

  14. Computational clustering for viral reference proteomes

    PubMed Central

    Chen, Chuming; Huang, Hongzhan; Mazumder, Raja; Natale, Darren A.; McGarvey, Peter B.; Zhang, Jian; Polson, Shawn W.; Wang, Yuqi; Wu, Cathy H.

    2016-01-01

    Motivation: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. Results: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt’s curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. Availability and implementation: http://proteininformationresource.org/rps/viruses/ Contact: chenc@udel.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153712

  15. Phylogenetic relationship of Ornithobacterium rhinotracheale strains.

    PubMed

    DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo

    2018-04-10

    The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.

  16. Reference pricing for drugs: is it compatible with U.S. health care?

    PubMed

    Kanavos, Panos; Reinhardt, Uwe

    2003-01-01

    To control spending on prescription drugs, health insurance systems abroad have experimented in recent years with a novel form of patient cost sharing called "reference pricing." Under this approach, the insurer covers only the prices of low-cost, benchmark drugs in therapeutic clusters that are deemed to be close substitutes for one another in treating specific illnesses. Patients who desire a higher-price substitute in a cluster must then pay the full difference between the retail price of that drug and the reference price covered by the insurer. This paper explores the difficult trade-offs that policymakers must make in designing such a system, drawing where relevant from experience abroad.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zaporozhets, Irina A.; Ivanov, Vladimir V.; Lyakh, Dmitry I.

    The earlier proposed multi-reference state-specific coupled-cluster theory with the complete active space reference suffered from a problem of energy discontinuities when the formal reference state was changing in the calculation of the potential energy curve (PEC). A simple remedy to the discontinuity problem is found and is presented in this work. It involves using natural complete active space self-consistent field active orbitals in the complete active space coupled-cluster calculations. As a result, the approach gives smooth PECs for different types of dissociation problems, as illustrated in the calculations of the dissociation of the single bond in the hydrogen fluorine moleculemore » and of the symmetric double-bond dissociation in the water molecule.« less

  18. Yellow evolved stars in open clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sowell, J.R.

    1987-05-01

    This paper describes a program in which Galactic cluster post-AGB candidates were first identified and then analyzed for cluster membership via radial velocities, monitored for possible photometric variations, examined for evidence of mass loss, and classified as completely as possible in terms of their basic stellar parameters. The intrinsically brightest supergiants are found in the youngest clusters. With increasing cluster age, the absolute luminosities attained by the supergiants decline. It appears that the evolutionary tracks of luminosity class II stars are more similar to those of class I than of class III. Only two superluminous giant star candidates are foundmore » in open clusters. 154 references.« less

  19. An update on the Enzyme Portal: an integrative approach for exploring enzyme knowledge

    PubMed Central

    Onwubiko, J.; Zaru, R.; Rosanoff, S.; Antunes, R.; Bingley, M.; Watkins, X.; O'Donovan, C.; Martin, M. J.

    2017-01-01

    Abstract Enzymes are a key part of life processes and are increasingly important for various areas of research such as medicine, biotechnology, bioprocessing and drug research. The goal of the Enzyme Portal is to provide an interface to all European Bioinformatics Institute (EMBL-EBI) data about enzymes (de Matos, P., et al., (2013), BMC Bioinformatics, 14 (1), 103). These data include enzyme function, sequence features and family classification, protein structure, reactions, pathways, small molecules, diseases and the associated literature. The sources of enzyme data are: the UniProt Knowledgebase (UniProtKB) (UniProt Consortium, 2015), the Protein Data Bank in Europe (PDBe), (Valenkar, S., et al., Nucleic Acids Res.2016; 44, D385–D395) Rhea—a database of enzyme-catalysed reactions (Morgat, A., et al., Nucleic Acids Res. 2015; 43, D459-D464), Reactome—a database of biochemical pathways (Fabregat, A., et al., Nucleic Acids Res. 2016; 44, D481–D487), IntEnz—a resource with enzyme nomenclature information (Fleischmann, A., et al., Nucleic Acids Res. 2004 32, D434–D437) and ChEBI (Hastings, J., et al., Nucleic Acids Res. 2013) and ChEMBL (Bento, A. P., et al., Nucleic Acids Res. 201442, 1083–1090)—resources which contain information about small-molecule chemistry and bioactivity. This article describes the redesign of Enzyme Portal and the increased functionality added to maximise integration and interpretation of these data. Use case examples of the Enzyme Portal and the versatile workflows its supports are illustrated. We welcome the suggestion of new resources for integration. PMID:28158609

  20. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case

    PubMed Central

    Gabella, Chiara; Durinx, Christine; Appel, Ron

    2018-01-01

    Millions of life scientists across the world rely on bioinformatics data resources for their research projects. Data resources can be very expensive, especially those with a high added value as the expert-curated knowledgebases. Despite the increasing need for such highly accurate and reliable sources of scientific information, most of them do not have secured funding over the near future and often depend on short-term grants that are much shorter than their planning horizon. Additionally, they are often evaluated as research projects rather than as research infrastructure components. In this work, twelve funding models for data resources are described and applied on the case study of the Universal Protein Resource (UniProt), a key resource for protein sequences and functional information knowledge. We show that most of the models present inconsistencies with open access or equity policies, and that while some models do not allow to cover the total costs, they could potentially be used as a complementary income source. We propose the Infrastructure Model as a sustainable and equitable model for all core data resources in the life sciences. With this model, funding agencies would set aside a fixed percentage of their research grant volumes, which would subsequently be redistributed to core data resources according to well-defined selection criteria. This model, compatible with the principles of open science, is in agreement with several international initiatives such as the Human Frontiers Science Program Organisation (HFSPO) and the OECD Global Science Forum (GSF) project. Here, we have estimated that less than 1% of the total amount dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide, including both knowledgebases and deposition databases. PMID:29333230

  1. Processing SPARQL queries with regular expressions in RDF databases

    PubMed Central

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  2. Processing SPARQL queries with regular expressions in RDF databases.

    PubMed

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  3. An update on the Enzyme Portal: an integrative approach for exploring enzyme knowledge.

    PubMed

    Pundir, S; Onwubiko, J; Zaru, R; Rosanoff, S; Antunes, R; Bingley, M; Watkins, X; O'Donovan, C; Martin, M J

    2017-03-01

    Enzymes are a key part of life processes and are increasingly important for various areas of research such as medicine, biotechnology, bioprocessing and drug research. The goal of the Enzyme Portal is to provide an interface to all European Bioinformatics Institute (EMBL-EBI) data about enzymes (de Matos, P., et al. , (2013), BMC Bioinformatics , (1), 103). These data include enzyme function, sequence features and family classification, protein structure, reactions, pathways, small molecules, diseases and the associated literature. The sources of enzyme data are: the UniProt Knowledgebase (UniProtKB) (UniProt Consortium, 2015), the Protein Data Bank in Europe (PDBe), (Valenkar, S., et al ., Nucleic Acids Res. 2016; , D385-D395) Rhea-a database of enzyme-catalysed reactions (Morgat, A., et al .,  Nucleic Acids Res.  2015; , D459-D464), Reactome-a database of biochemical pathways (Fabregat, A., et al ., Nucleic Acids Res. 2016;  , D481-D487), IntEnz-a resource with enzyme nomenclature information (Fleischmann, A., et al ., Nucleic Acids Res.  2004 , D434-D437) and ChEBI (Hastings, J., et al .,  Nucleic Acids Res. 2013) and ChEMBL (Bento, A. P., et al ., Nucleic Acids Res.  2014 , 1083-1090)-resources which contain information about small-molecule chemistry and bioactivity. This article describes the redesign of Enzyme Portal and the increased functionality added to maximise integration and interpretation of these data. Use case examples of the Enzyme Portal and the versatile workflows its supports are illustrated. We welcome the suggestion of new resources for integration. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  4. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case.

    PubMed

    Gabella, Chiara; Durinx, Christine; Appel, Ron

    2017-01-01

    Millions of life scientists across the world rely on bioinformatics data resources for their research projects. Data resources can be very expensive, especially those with a high added value as the expert-curated knowledgebases. Despite the increasing need for such highly accurate and reliable sources of scientific information, most of them do not have secured funding over the near future and often depend on short-term grants that are much shorter than their planning horizon. Additionally, they are often evaluated as research projects rather than as research infrastructure components. In this work, twelve funding models for data resources are described and applied on the case study of the Universal Protein Resource (UniProt), a key resource for protein sequences and functional information knowledge. We show that most of the models present inconsistencies with open access or equity policies, and that while some models do not allow to cover the total costs, they could potentially be used as a complementary income source. We propose the Infrastructure Model as a sustainable and equitable model for all core data resources in the life sciences. With this model, funding agencies would set aside a fixed percentage of their research grant volumes, which would subsequently be redistributed to core data resources according to well-defined selection criteria. This model, compatible with the principles of open science, is in agreement with several international initiatives such as the Human Frontiers Science Program Organisation (HFSPO) and the OECD Global Science Forum (GSF) project. Here, we have estimated that less than 1% of the total amount dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide, including both knowledgebases and deposition databases.

  5. A coupled cluster theory with iterative inclusion of triple excitations and associated equation of motion formulation for excitation energy and ionization potential

    NASA Astrophysics Data System (ADS)

    Maitra, Rahul; Akinaga, Yoshinobu; Nakajima, Takahito

    2017-08-01

    A single reference coupled cluster theory that is capable of including the effect of connected triple excitations has been developed and implemented. This is achieved by regrouping the terms appearing in perturbation theory and parametrizing through two different sets of exponential operators: while one of the exponentials, involving general substitution operators, annihilates the ground state but has a non-vanishing effect when it acts on the excited determinant, the other is the regular single and double excitation operator in the sense of conventional coupled cluster theory, which acts on the Hartree-Fock ground state. The two sets of operators are solved as coupled non-linear equations in an iterative manner without significant increase in computational cost than the conventional coupled cluster theory with singles and doubles excitations. A number of physically motivated and computationally advantageous sufficiency conditions are invoked to arrive at the working equations and have been applied to determine the ground state energies of a number of small prototypical systems having weak multi-reference character. With the knowledge of the correlated ground state, we have reconstructed the triple excitation operator and have performed equation of motion with coupled cluster singles, doubles, and triples to obtain the ionization potential and excitation energies of these molecules as well. Our results suggest that this is quite a reasonable scheme to capture the effect of connected triple excitations as long as the ground state remains weakly multi-reference.

  6. Designing Web-based Telemedicine Training for Military Health Care Providers.

    ERIC Educational Resources Information Center

    Bangert, David; Doktor, Boert; Johnson, Erik

    2001-01-01

    Interviews with 48 military health care professionals identified 20 objectives and 4 learning clusters for a telemedicine training curriculum. From these clusters, web-based modules were developed addressing clinical learning, technology, organizational issues, and introduction to telemedicine. (Contains 19 references.) (SK)

  7. Clustering by soft-constraint affinity propagation: applications to gene-expression data.

    PubMed

    Leone, Michele; Sumedha; Weigt, Martin

    2007-10-15

    Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.

  8. A stellar tracking reference system

    NASA Technical Reports Server (NTRS)

    Klestadt, B.

    1971-01-01

    A stellar attitude reference system concept for satellites was studied which promises to permit continuous precision pointing of payloads with accuracies of 0.001 degree without the use of gyroscopes. It is accomplished with the use of a single, clustered star tracker assembly mounted on a non-orthogonal, two gimbal mechanism, driven so as to unwind satellite orbital and orbit precession rates. A set of eight stars was found which assures the presence of an adequate inertial reference on a continuous basis in an arbitrary orbit. Acquisition and operational considerations were investigated and inherent reference redundancy/reliability was established. Preliminary designs for the gimbal mechanism, its servo drive, and the star tracker cluster with its associated signal processing were developed for a baseline sun-synchronous, noon-midnight orbit. The functions required of the onboard computer were determined and the equations to be solved were found. In addition detailed error analyses were carried out, based on structural, thermal and other operational considerations.

  9. De novo assembly of the transcriptome of the non-model plant Streptocarpus rexii employing a novel heuristic to recover locus-specific transcript clusters.

    PubMed

    Chiara, Matteo; Horner, David S; Spada, Alberto

    2013-01-01

    De novo transcriptome characterization from Next Generation Sequencing data has become an important approach in the study of non-model plants. Despite notable advances in the assembly of short reads, the clustering of transcripts into unigene-like (locus-specific) clusters remains a somewhat neglected subject. Indeed, closely related paralogous transcripts are often merged into single clusters by current approaches. Here, a novel heuristic method for locus-specific clustering is compared to that implemented in the de novo assembler Oases, using the same initial transcript collections, derived from Arabidopsis thaliana and the developmental model Streptocarpus rexii. We show that the proposed approach improves cluster specificity in the A. thaliana dataset for which the reference genome is available. Furthermore, for the S. rexii data our filtered transcript collection matches a larger number of distinct annotated loci in reference genomes than the Oases set, while containing a reduced overall number of loci. A detailed discussion of advantages and limitations of our approach in processing de novo transcriptome reconstructions is presented. The proposed method should be widely applicable to other organisms, irrespective of the transcript assembly method employed. The S. rexii transcriptome is available as a sophisticated and augmented publicly available online database.

  10. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

    PubMed Central

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986

  11. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    PubMed

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  12. Robust fluoroscopic respiratory gating for lung cancer radiotherapy without implanted fiducial markers

    NASA Astrophysics Data System (ADS)

    Cui, Ying; Dy, Jennifer G.; Sharp, Greg C.; Alexander, Brian; Jiang, Steve B.

    2007-02-01

    For gated lung cancer radiotherapy, it is difficult to generate accurate gating signals due to the large uncertainties when using external surrogates and the risk of pneumothorax when using implanted fiducial markers. We have previously investigated and demonstrated the feasibility of generating gating signals using the correlation scores between the reference template image and the fluoroscopic images acquired during the treatment. In this paper, we present an in-depth study, aiming at the improvement of robustness of the algorithm and its validation using multiple sets of patient data. Three different template generating and matching methods have been developed and evaluated: (1) single template method, (2) multiple template method, and (3) template clustering method. Using the fluoroscopic data acquired during patient setup before each fraction of treatment, reference templates are built that represent the tumour position and shape in the gating window, which is assumed to be at the end-of-exhale phase. For the single template method, all the setup images within the gating window are averaged to generate a composite template. For the multiple template method, each setup image in the gating window is considered as a reference template and used to generate an ensemble of correlation scores. All the scores are then combined to generate the gating signal. For the template clustering method, clustering (grouping of similar objects together) is performed to reduce the large number of reference templates into a few representative ones. Each of these methods has been evaluated against the reference gating signal as manually determined by a radiation oncologist. Five patient datasets were used for evaluation. In each case, gated treatments were simulated at both 35% and 50% duty cycles. False positive, negative and total error rates were computed. Experiments show that the single template method is sensitive to noise; the multiple template and clustering methods are more robust to noise due to the smoothing effect of aggregation of correlation scores; and the clustering method results in the best performance in terms of computational efficiency and accuracy.

  13. Validation of the (GTG)(5)-rep-PCR fingerprinting technique for rapid classification and identification of acetic acid bacteria, with a focus on isolates from Ghanaian fermented cocoa beans.

    PubMed

    De Vuyst, Luc; Camu, Nicholas; De Winter, Tom; Vandemeulebroecke, Katrien; Van de Perre, Vincent; Vancanneyt, Marc; De Vos, Paul; Cleenwerck, Ilse

    2008-06-30

    Amplification of repetitive bacterial DNA elements through the polymerase chain reaction (rep-PCR fingerprinting) using the (GTG)(5) primer, referred to as (GTG)(5)-PCR fingerprinting, was found a promising genotypic tool for rapid and reliable speciation of acetic acid bacteria (AAB). The method was evaluated with 64 AAB reference strains, including 31 type strains, and 132 isolates from Ghanaian, fermented cocoa beans, and was validated with DNA:DNA hybridization data. Most reference strains, except for example all Acetobacter indonesiensis strains and Gluconacetobacter liquefaciens LMG 1509, grouped according to their species designation, indicating the usefulness of this technique for identification to the species level. Moreover, exclusive patterns were obtained for most strains, suggesting that the technique can also be used for characterization below species level or typing of AAB strains. The (GTG)(5)-PCR fingerprinting allowed us to differentiate four major clusters among the fermented cocoa bean isolates, namely A. pasteurianus (cluster I, 100 isolates), A. syzygii- or A. lovaniensis-like (cluster II, 23 isolates), and A. tropicalis-like (clusters III and IV containing 4 and 5 isolates, respectively). A. syzygii-like and A. tropicalis-like strains from cocoa bean fermentations were reported for the first time. Validation of the method and indications for reclassifications of AAB species and existence of new Acetobacter species were obtained through 16S rRNA sequencing analyses and DNA:DNA hybridizations. Reclassifications refer to A. aceti LMG 1531, Ga. xylinus LMG 1518, and Ga. xylinus subsp. sucrofermentans LMG 18788(T).

  14. Alternative definition of excitation amplitudes in multi-reference state-specific coupled cluster

    NASA Astrophysics Data System (ADS)

    Garniron, Yann; Giner, Emmanuel; Malrieu, Jean-Paul; Scemama, Anthony

    2017-04-01

    A central difficulty of state-specific Multi-Reference Coupled Cluster (MR-CC) in the multi-exponential Jeziorski-Monkhorst formalism concerns the definition of the amplitudes of the single and double excitation operators appearing in the exponential wave operators. If the reference space is a complete active space (CAS), the number of these amplitudes is larger than the number of singly and doubly excited determinants on which one may project the eigenequation, and one must impose additional conditions. The present work first defines a state-specific reference-independent operator T˜ ^ m which acting on the CAS component of the wave function |Ψ0m⟩ maximizes the overlap between (1 +T˜ ^ m ) |Ψ0m⟩ and the eigenvector of the CAS-SD (Singles and Doubles) Configuration Interaction (CI) matrix |ΨCAS-SDm⟩ . This operator may be used to generate approximate coefficients of the triples and quadruples, and a dressing of the CAS-SD CI matrix, according to the intermediate Hamiltonian formalism. The process may be iterated to convergence. As a refinement towards a strict coupled cluster formalism, one may exploit reference-independent amplitudes provided by (1 +T˜ ^ m ) |Ψ0m⟩ to define a reference-dependent operator T^ m by fitting the eigenvector of the (dressed) CAS-SD CI matrix. The two variants, which are internally uncontracted, give rather similar results. The new MR-CC version has been tested on the ground state potential energy curves of 6 molecules (up to triple-bond breaking) and two excited states. The non-parallelism error with respect to the full-CI curves is of the order of 1 mEh.

  15. Efficient clustering aggregation based on data fragments.

    PubMed

    Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

    2012-06-01

    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.

  16. Development of New Open-Shell Perturbation and Coupled-Cluster Theories Based on Symmetric Spin Orbitals

    NASA Technical Reports Server (NTRS)

    Lee, Timothy J.; Arnold, James O. (Technical Monitor)

    1994-01-01

    A new spin orbital basis is employed in the development of efficient open-shell coupled-cluster and perturbation theories that are based on a restricted Hartree-Fock (RHF) reference function. The spin orbital basis differs from the standard one in the spin functions that are associated with the singly occupied spatial orbital. The occupied orbital (in the spin orbital basis) is assigned the delta(+) = 1/square root of 2(alpha+Beta) spin function while the unoccupied orbital is assigned the delta(-) = 1/square root of 2(alpha-Beta) spin function. The doubly occupied and unoccupied orbitals (in the reference function) are assigned the standard alpha and Beta spin functions. The coupled-cluster and perturbation theory wave functions based on this set of "symmetric spin orbitals" exhibit much more symmetry than those based on the standard spin orbital basis. This, together with interacting space arguments, leads to a dramatic reduction in the computational cost for both coupled-cluster and perturbation theory. Additionally, perturbation theory based on "symmetric spin orbitals" obeys Brillouin's theorem provided that spin and spatial excitations are both considered. Other properties of the coupled-cluster and perturbation theory wave functions and models will be discussed.

  17. On-Line Pattern Analysis and Recognition System. OLPARS VI. Software Reference Manual,

    DTIC Science & Technology

    1982-06-18

    Discriminant Analysis Data Transformation, Feature Extraction, Feature Evaluation Cluster Analysis, Classification Computer Software 20Z. ABSTRACT... cluster /scatter cut-off value, (2) change the one-space bin factor, (3) change from long prompts to short prompts or vice versa, (4) change the...value, a cluster plot is displayed, otherwise a scatter plot is shown. if option 1 is selected, the program requests that a new value be input

  18. A Cluster Analytic Study of Clinical Orientations among Chemical Dependency Counselors.

    ERIC Educational Resources Information Center

    Thombs, Dennis L.; Osborn, Cynthia J.

    2001-01-01

    Three distinct clinical orientations were identified in a sample of chemical dependency counselors (N=406). Based on cluster analysis, the largest group, identified and labeled as "uniform counselors," endorsed a simple, moral-disease model with little interest in psychosocial interventions. (Contains 50 references and 4 tables.) (GCP)

  19. A Unified Approach to Electron Counting in Main-Group Clusters

    ERIC Educational Resources Information Center

    McGrady, John E.

    2004-01-01

    A presentation of an extensive review of traditional approaches to teaching electron counting is given. The electron-precise clusters are usually taken as a reference point for rationalizing the structures of their electron-rich counterparts, which are characterized by valence electron counts greater than 5n.

  20. Hypervelocity Inflight Trajectory Scatter (HITS) Code. User’s Manual

    DTIC Science & Technology

    1976-04-01

    referred to as skewness. The fourth central moment is referred to as kurtosis and provides an additional measure of the clustering of the distribution... clustering of data points in the resulting "scatter diagram" would indicate correlation. The correlation can be quantified by fit- ting a straight line...10- *4 U* 0%.0 Coo IX wV~ Z de *I *- 0 On Wa )I GMX Ŕ.~ Ot. *P0 WX #- % OW- W 00WIj u +,- >ot o-O- oW cbok~ o*9 lWI $ Wld6 0000000N000 0 4 00 6 6

  1. Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

    NASA Astrophysics Data System (ADS)

    Black, Joshua A.; Knowles, Peter J.

    2018-06-01

    The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.

  2. Coupled-cluster computations of atomic nuclei

    NASA Astrophysics Data System (ADS)

    Hagen, G.; Papenbrock, T.; Hjorth-Jensen, M.; Dean, D. J.

    2014-09-01

    In the past decade, coupled-cluster theory has seen a renaissance in nuclear physics, with computations of neutron-rich and medium-mass nuclei. The method is efficient for nuclei with product-state references, and it describes many aspects of weakly bound and unbound nuclei. This report reviews the technical and conceptual developments of this method in nuclear physics, and the results of coupled-cluster calculations for nucleonic matter, and for exotic isotopes of helium, oxygen, calcium, and some of their neighbors.

  3. Chronology of the halo globular cluster system formation.

    NASA Astrophysics Data System (ADS)

    Salaris, M.; Weiss, A.

    1997-11-01

    Using up-to-date stellar models and isochrones we determine the age of 25 galactic halo clusters. The clusters are distributed into four groups according to metallicity. We measure the absolute age of a reference cluster in each group, and then find the relative ages of the other clusters relative to this one. This combination yields the most reliable results. We find that the oldest cluster group on average is 11.8+/-0.9Gyr or 12.3+/-0.3Gyr old, depending on whether we include Arp 2 and Rup 106. The average age of all clusters is about 10.5Gyr. Questions concerning a common age for all clusters and a relation between metallicity and age are addressed. The groups of lower metallicity appear to be coeval, but our results indicate that globally the sample has an age spread, and age and metallicity are correlated but not with a simple linear relation.

  4. GDPC: Gravitation-based Density Peaks Clustering algorithm

    NASA Astrophysics Data System (ADS)

    Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

    2018-07-01

    The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.

  5. A cluster analytic study of the Wechsler Intelligence Test for Children-IV in children referred for psychoeducational assessment due to persistent academic difficulties.

    PubMed

    Hale, Corinne R; Casey, Joseph E; Ricciardi, Philip W R

    2014-02-01

    Wechsler Intelligence Test for Children-IV core subtest scores of 472 children were cluster analyzed to determine if reliable and valid subgroups would emerge. Three subgroups were identified. Clusters were reliable across different stages of the analysis as well as across algorithms and samples. With respect to external validity, the Globally Low cluster differed from the other two clusters on Wechsler Individual Achievement Test-II Word Reading, Numerical Operations, and Spelling subtests, whereas the latter two clusters did not differ from one another. The clusters derived have been identified in studies using previous WISC editions. Clusters characterized by poor performance on subtests historically associated with the VIQ (i.e., VCI + WMI) and PIQ (i.e., POI + PSI) did not emerge, nor did a cluster characterized by low scores on PRI subtests. Picture Concepts represented the highest subtest score in every cluster, failing to vary in a predictable manner with the other PRI subtests.

  6. The Comparison of Iranian Normative Reference Data with Five Countries ‎Across Variables in Eight Rorschach Comprehensive System (CS) Clusters

    PubMed Central

    Hosseininasab, Abufazel; Mohammadi, Mohammadreza; Jouzi, Samira; Esmaeilinasab, Maryam; Delavar, Ali

    2016-01-01

    Objective: This study aimed to provide a normative study documenting how 114 five-seven year-old non-‎patient Iranian children respond to the Rorschach test. We compared this especial sample to ‎international normative reference values for the Comprehensive System (CS).‎ Method: One hundred fourteen 5- 7- year-old non-patient Iranian children were recruited from public ‎schools. Using five child and adolescent samples from five countries, we compared Iranian ‎Normative Reference Data- based on reference means and standard deviations for each sample.‎ Results: Findings revealed that how the scores in each sample were distributed and how the samples were ‎compared across variables in eight Rorschach Comprehensive System (CS) clusters. We reported ‎all descriptive statistics such as reference mean and standard deviation for all variables.‎ Conclusion: Iranian clinicians could rely on country specific or “local norms” when assessing children. We ‎discourage Iranian clinicians to use many CS scores to make nomothetic, score-based inferences ‎about psychopathology in children and adolescents.‎ PMID:27928247

  7. Opportunities of Learning through the History of Mathematics: The Example of National Textbooks in Cyprus and Greece

    ERIC Educational Resources Information Center

    Xenofontos, Constantinos; Papadopoulos, Christos E.

    2015-01-01

    In this paper, we examine the ways the history of mathematics is integrated in the national textbooks of Cyprus and Greece. Our data-driven analyses suggest that the references identified can be clustered in four categories: (a) biographical references about mathematicians or historical references regarding the origins of a mathematical concept…

  8. Impact of the choice of reference genome on the ability of the core genome SNV methodology to distinguish strains of Salmonella enterica serovar Heidelberg.

    PubMed

    Usongo, Valentine; Berry, Chrystal; Yousfi, Khadidja; Doualla-Bell, Florence; Labbé, Genevieve; Johnson, Roger; Fournier, Eric; Nadon, Celine; Goodridge, Lawrence; Bekal, Sadjia

    2018-01-01

    Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. The core genome single nucleotide variant pipeline (cgSNV) is one of several whole genome based sequence typing methods used for the laboratory investigation of foodborne pathogens. SNV detection using this method requires a reference genome. The purpose of this study was to investigate the impact of the choice of the reference genome on the cgSNV-informed phylogenetic clustering and inferred isolate relationships. We found that using a draft or closed genome of S. Heidelberg as reference did not impact the ability of the cgSNV methodology to differentiate among 145 S. Heidelberg isolates involved in foodborne outbreaks. We also found that using a distantly related genome such as S. Dublin as choice of reference led to a loss in resolution since some sporadic isolates were found to cluster together with outbreak isolates. In addition, the genetic distances between outbreak isolates as well as between outbreak and sporadic isolates were overall reduced when S. Dublin was used as the reference genome as opposed to S. Heidelberg.

  9. The Influence of the Phonological Neighborhood Clustering Coefficient on Spoken Word Recognition

    ERIC Educational Resources Information Center

    Chan, Kit Ying; Vitevitch, Michael S.

    2009-01-01

    Clustering coefficient--a measure derived from the new science of networks--refers to the proportion of phonological neighbors of a target word that are also neighbors of each other. Consider the words "bat", "hat", and "can", all of which are neighbors of the word "cat"; the words "bat" and…

  10. Crossmaps: Visualization of overlapping relationships in collections of journal papers

    PubMed Central

    Morris, Steven A.; Yen, Gary G.

    2004-01-01

    A crossmapping technique is introduced for visualizing multiple and overlapping relations among entity types in collections of journal articles. Groups of entities from two entity types are crossplotted to show correspondence of relations. For example, author collaboration groups are plotted on the x axis against groups of papers (research fronts) on the y axis. At the intersection of each pair of author group/research front pairs a circular symbol is plotted whose size is proportional to the number of times that authors in the group appear as authors in papers in the research front. Entity groups are found by agglomerative hierarchical clustering using conventional similarity measures. Crossmaps comprise a simple technique that is particularly suited to showing overlap in relations among entity groups. Particularly useful crossmaps are: research fronts against base reference clusters, research fronts against author collaboration groups, and research fronts against term co-occurrence clusters. When exploring the knowledge domain of a collection of journal papers, it is useful to have several crossmaps of different entity pairs, complemented by research front timelines and base reference cluster timelines. PMID:14762168

  11. Attenuated coupled cluster: a heuristic polynomial similarity transformation incorporating spin symmetry projection into traditional coupled cluster theory

    NASA Astrophysics Data System (ADS)

    Gomez, John A.; Henderson, Thomas M.; Scuseria, Gustavo E.

    2017-11-01

    In electronic structure theory, restricted single-reference coupled cluster (CC) captures weak correlation but fails catastrophically under strong correlation. Spin-projected unrestricted Hartree-Fock (SUHF), on the other hand, misses weak correlation but captures a large portion of strong correlation. The theoretical description of many important processes, e.g. molecular dissociation, requires a method capable of accurately capturing both weak and strong correlation simultaneously, and would likely benefit from a combined CC-SUHF approach. Based on what we have recently learned about SUHF written as particle-hole excitations out of a symmetry-adapted reference determinant, we here propose a heuristic CC doubles model to attenuate the dominant spin collective channel of the quadratic terms in the CC equations. Proof of principle results presented here are encouraging and point to several paths forward for improving the method further.

  12. On the accuracy of density-functional theory exchange-correlation functionals for H bonds in small water clusters: Benchmarks approaching the complete basis set limit

    NASA Astrophysics Data System (ADS)

    Santra, Biswajit; Michaelides, Angelos; Scheffler, Matthias

    2007-11-01

    The ability of several density-functional theory (DFT) exchange-correlation functionals to describe hydrogen bonds in small water clusters (dimer to pentamer) in their global minimum energy structures is evaluated with reference to second order Møller-Plesset perturbation theory (MP2). Errors from basis set incompleteness have been minimized in both the MP2 reference data and the DFT calculations, thus enabling a consistent systematic evaluation of the true performance of the tested functionals. Among all the functionals considered, the hybrid X3LYP and PBE0 functionals offer the best performance and among the nonhybrid generalized gradient approximation functionals, mPWLYP and PBE1W perform best. The popular BLYP and B3LYP functionals consistently underbind and PBE and PW91 display rather variable performance with cluster size.

  13. On the accuracy of density-functional theory exchange-correlation functionals for H bonds in small water clusters: benchmarks approaching the complete basis set limit.

    PubMed

    Santra, Biswajit; Michaelides, Angelos; Scheffler, Matthias

    2007-11-14

    The ability of several density-functional theory (DFT) exchange-correlation functionals to describe hydrogen bonds in small water clusters (dimer to pentamer) in their global minimum energy structures is evaluated with reference to second order Moller-Plesset perturbation theory (MP2). Errors from basis set incompleteness have been minimized in both the MP2 reference data and the DFT calculations, thus enabling a consistent systematic evaluation of the true performance of the tested functionals. Among all the functionals considered, the hybrid X3LYP and PBE0 functionals offer the best performance and among the nonhybrid generalized gradient approximation functionals, mPWLYP and PBE1W perform best. The popular BLYP and B3LYP functionals consistently underbind and PBE and PW91 display rather variable performance with cluster size.

  14. A visual pathway links brain structures active during magnetic compass orientation in migratory birds.

    PubMed

    Heyers, Dominik; Manns, Martina; Luksch, Harald; Güntürkün, Onur; Mouritsen, Henrik

    2007-09-26

    The magnetic compass of migratory birds has been suggested to be light-dependent. Retinal cryptochrome-expressing neurons and a forebrain region, "Cluster N", show high neuronal activity when night-migratory songbirds perform magnetic compass orientation. By combining neuronal tracing with behavioral experiments leading to sensory-driven gene expression of the neuronal activity marker ZENK during magnetic compass orientation, we demonstrate a functional neuronal connection between the retinal neurons and Cluster N via the visual thalamus. Thus, the two areas of the central nervous system being most active during magnetic compass orientation are part of an ascending visual processing stream, the thalamofugal pathway. Furthermore, Cluster N seems to be a specialized part of the visual wulst. These findings strongly support the hypothesis that migratory birds use their visual system to perceive the reference compass direction of the geomagnetic field and that migratory birds "see" the reference compass direction provided by the geomagnetic field.

  15. Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

    PubMed

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M Teresa; Martín, María P

    2009-07-29

    Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

  16. Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

    PubMed Central

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

    2009-01-01

    Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601

  17. BioServices: a common Python package to access biological Web Services programmatically.

    PubMed

    Cokelaer, Thomas; Pultz, Dennis; Harder, Lea M; Serra-Musach, Jordi; Saez-Rodriguez, Julio

    2013-12-15

    Web interfaces provide access to numerous biological databases. Many can be accessed to in a programmatic way thanks to Web Services. Building applications that combine several of them would benefit from a single framework. BioServices is a comprehensive Python framework that provides programmatic access to major bioinformatics Web Services (e.g. KEGG, UniProt, BioModels, ChEMBLdb). Wrapping additional Web Services based either on Representational State Transfer or Simple Object Access Protocol/Web Services Description Language technologies is eased by the usage of object-oriented programming. BioServices releases and documentation are available at http://pypi.python.org/pypi/bioservices under a GPL-v3 license.

  18. PDBe: Protein Data Bank in Europe

    PubMed Central

    Velankar, S.; Alhroub, Y.; Best, C.; Caboche, S.; Conroy, M. J.; Dana, J. M.; Fernandez Montecelo, M. A.; van Ginkel, G.; Golovin, A.; Gore, S. P.; Gutmanas, A.; Haslam, P.; Hendrickx, P. M. S.; Heuson, E.; Hirshberg, M.; John, M.; Lagerstedt, I.; Mir, S.; Newman, L. E.; Oldfield, T. J.; Patwardhan, A.; Rinaldi, L.; Sahni, G.; Sanz-García, E.; Sen, S.; Slowley, R.; Suarez-Uruena, A.; Swaminathan, G. J.; Symmons, M. F.; Vranken, W. F.; Wainwright, M.; Kleywegt, G. J.

    2012-01-01

    The Protein Data Bank in Europe (PDBe; pdbe.org) is a partner in the Worldwide PDB organization (wwPDB; wwpdb.org) and as such actively involved in managing the single global archive of biomacromolecular structure data, the PDB. In addition, PDBe develops tools, services and resources to make structure-related data more accessible to the biomedical community. Here we describe recently developed, extended or improved services, including an animated structure-presentation widget (PDBportfolio), a widget to graphically display the coverage of any UniProt sequence in the PDB (UniPDB), chemistry- and taxonomy-based PDB-archive browsers (PDBeXplore), and a tool for interactive visualization of NMR structures, corresponding experimental data as well as validation and analysis results (Vivaldi). PMID:22110033

  19. An AERONET-Based Aerosol Classification Using the Mahalanobis Distance

    NASA Technical Reports Server (NTRS)

    Hamill, Patrick; Giordano, Marco; Ward, Carolyne; Giles, David; Holben, Brent

    2016-01-01

    We present an aerosol classification based on AERONET aerosol data from 1993 to 2012. We used the AERONET Level 2.0 almucantar aerosol retrieval products to define several reference aerosol clusters which are characteristic of the following general aerosol types: Urban-Industrial, Biomass Burning, Mixed Aerosol, Dust, and Maritime. The classification of a particular aerosol observation as one of these aerosol types is determined by its five-dimensional Mahalanobis distance to each reference cluster. We have calculated the fractional aerosol type distribution at 190 AERONET sites, as well as the monthly variation in aerosol type at those locations. The results are presented on a global map and individually in the supplementary material. Our aerosol typing is based on recognizing that different geographic regions exhibit characteristic aerosol types. To generate reference clusters we only keep data points that lie within a Mahalanobis distance of 2 from the centroid. Our aerosol characterization is based on the AERONET retrieved quantities, therefore it does not include low optical depth values. The analysis is based on point sources (the AERONET sites) rather than globally distributed values. The classifications obtained will be useful in interpreting aerosol retrievals from satellite borne instruments.

  20. A compilation of redshifts and velocity dispersions for Abell clusters (Struble and Rood 1987): Documentation for the machine-readable version

    NASA Technical Reports Server (NTRS)

    Warren, Wayne H., Jr.

    1989-01-01

    The machine readable version of the compilation, as it is currently being distributed from the Astronomical Data Center, is described. The catalog contains redshifts and velocity dispersions for all Abell clusters for which these data had been published up to 1986 July. Also included are 1950 equatorial coordinates for the centers of the listed clusters, numbers of observations used to determine the redshifts, and bibliographical references citing the data sources.

  1. Clustering evolving proteins into homologous families.

    PubMed

    Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A

    2013-04-08

    Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.

  2. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

    PubMed

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.

  3. Reconsidering Cluster Bias in Multilevel Data: A Monte Carlo Comparison of Free and Constrained Baseline Approaches.

    PubMed

    Guenole, Nigel

    2018-01-01

    The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository.

  4. Reconsidering Cluster Bias in Multilevel Data: A Monte Carlo Comparison of Free and Constrained Baseline Approaches

    PubMed Central

    Guenole, Nigel

    2018-01-01

    The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository. PMID:29551985

  5. Word-initial rhotic clusters in Spanish-speaking preschoolers in Chile and Granada, Spain.

    PubMed

    Perez, Denisse; Vivar, Pilar; Bernhardt, Barbara May; Mendoza, Elvira; Ávila, Carmen; Carballo, Gloria; Fresneda, Dolores; Muñoz, Juana; Vergara, Patricio

    2018-01-01

    The current paper describes Spanish acquisition of rhotic onset clusters. Data are also provided on related singleton taps/trills and /l/ as a singleton and in clusters. Participants included 9 typically developing (TD) toddlers and 30 TD preschoolers in Chile, and 30 TD preschoolers and 29 with protracted phonological development (PPD) in Granada, Spain. Results showed age and developmental group effects. Preservation of cluster timing units preceded segmental accuracy, especially in stressed syllables. Tap clusters versus singleton trills were variable in order of mastery, some children mastering clusters first, and others, the trill. Rhotics were acquired later than /l/. In early development, mismatches (errors) involved primarily deletion of taps; where substitutions occurred, [j] frequently replaced tap. In later development, [l] more frequently replaced tap; where taps did occur, vowel epenthesis sometimes occurred. The data serve as a criterion reference database for onset cluster acquisition in Chilean and Granada Spanish.

  6. A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

    PubMed

    Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

    2016-12-01

    This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.

  7. Mathematical description and program documentation for CLASSY, an adaptive maximum likelihood clustering method

    NASA Technical Reports Server (NTRS)

    Lennington, R. K.; Rassbach, M. E.

    1979-01-01

    Discussed in this report is the clustering algorithm CLASSY, including detailed descriptions of its general structure and mathematical background and of the various major subroutines. The report provides a development of the logic and equations used with specific reference to program variables. Some comments on timing and proposed optimization techniques are included.

  8. Development and Application of Single-Referenced Perturbation and Coupled-Cluster Theories for Excited Electronic States

    NASA Technical Reports Server (NTRS)

    Lee, Timothy J.; Langhoff, Stephen R. (Technical Monitor)

    1997-01-01

    Recent work on the development of single-reference perturbation theories for the study of excited electronic states will be discussed. The utility of these methods will be demonstrated by comparison to linear-response coupled-cluster excitation energies. Results for some halogen molecules of interest in stratospheric chemistry will be presented.

  9. Titanium oxo-clusters: precursors for a Lego-like construction of nanostructured hybrid materials.

    PubMed

    Rozes, Laurence; Sanchez, Clément

    2011-02-01

    Titanium oxo-clusters, well-defined monodispersed nano-objects, are appropriate nano-building blocks for the preparation of organic-inorganic materials by a bottom up approach. This critical review proposes to present the different structures of titanium oxo-clusters referenced in the literature and the different strategies followed to build up hybrid materials with these versatile building units. In particular, this critical review cites and reports on the most important papers in the literature, concentrating on recent developments in the field of synthesis, characterization, and the use of titanium oxo-clusters for the construction of advanced hybrid materials (137 references).

  10. A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

    PubMed Central

    2009-01-01

    Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666

  11. Large-scale seismic waveform quality metric calculation using Hadoop

    NASA Astrophysics Data System (ADS)

    Magana-Zook, S.; Gaylord, J. M.; Knapp, D. R.; Dodge, D. A.; Ruppert, S. D.

    2016-09-01

    In this work we investigated the suitability of Hadoop MapReduce and Apache Spark for large-scale computation of seismic waveform quality metrics by comparing their performance with that of a traditional distributed implementation. The Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) provided 43 terabytes of broadband waveform data of which 5.1 TB of data were processed with the traditional architecture, and the full 43 TB were processed using MapReduce and Spark. Maximum performance of 0.56 terabytes per hour was achieved using all 5 nodes of the traditional implementation. We noted that I/O dominated processing, and that I/O performance was deteriorating with the addition of the 5th node. Data collected from this experiment provided the baseline against which the Hadoop results were compared. Next, we processed the full 43 TB dataset using both MapReduce and Apache Spark on our 18-node Hadoop cluster. These experiments were conducted multiple times with various subsets of the data so that we could build models to predict performance as a function of dataset size. We found that both MapReduce and Spark significantly outperformed the traditional reference implementation. At a dataset size of 5.1 terabytes, both Spark and MapReduce were about 15 times faster than the reference implementation. Furthermore, our performance models predict that for a dataset of 350 terabytes, Spark running on a 100-node cluster would be about 265 times faster than the reference implementation. We do not expect that the reference implementation deployed on a 100-node cluster would perform significantly better than on the 5-node cluster because the I/O performance cannot be made to scale. Finally, we note that although Big Data technologies clearly provide a way to process seismic waveform datasets in a high-performance and scalable manner, the technology is still rapidly changing, requires a high degree of investment in personnel, and will likely require significant changes in other parts of our infrastructure. Nevertheless, we anticipate that as the technology matures and third-party tool vendors make it easier to manage and operate clusters, Hadoop (or a successor) will play a large role in our seismic data processing.

  12. Applied anatomic site study of palatal anchorage implants using cone beam computed tomography.

    PubMed

    Lai, Ren-fa; Zou, Hui; Kong, Wei-dong; Lin, Wei

    2010-06-01

    The purpose of this study was to conduct quantitative research on bone height and bone mineral density of palatal implant sites for implantation, and to provide reference sites for safe and stable palatal implants. Three-dimensional reformatting images were reconstructed by cone beam computed tomography (CBCT) in 34 patients, aged 18 to 35 years, using EZ Implant software. Bone height was measured at 20 sites of interest on the palate. Bone mineral density was measured at the 10 sites with the highest implantation rate, classified using K-mean cluster analysis based on bone height and bone mineral density. According to the cluster analysis, 10 sites were classified into three clusters. Significant differences in bone height and bone mineral density were detected between these three clusters (P<0.05). The greatest bone height was obtained in cluster 2, followed by cluster 1 and cluster 3. The highest bone mineral density was found in cluster 3, followed by cluster 1 and cluster 2. CBCT plays an important role in pre-surgical treatment planning. CBCT is helpful in identifying safe and stable implantation sites for palatal anchorage.

  13. A monoclonal antibody against SV40 large T antigen (PAb416) does not label Merkel cell carcinoma.

    PubMed

    Pelletier, Daniel J; Czeczok, Thomas W; Bellizzi, Andrew M

    2018-07-01

    Merkel cell carcinoma represents poorly differentiated neuroendocrine carcinoma of cutaneous origin. In most studies, the vast majority of Merkel cell carcinomas are Merkel cell polyomavirus (MCPyV)-associated. SV40 polyomavirus immunohistochemistry is typically used in the diagnosis of other polyomavirus-associated diseases, including tubulointerstitial nephritis and progressive multifocal leukoencephalopathy, given cross-reactivity with BK and JC polyomaviruses. MCPyV-specific immunohistochemistry is commercially available, but, if antibodies against SV40 also cross-reacted with MCPyV, that would be advantageous from a resource-utilisation perspective. Tissue microarrays were constructed from 39 Merkel cell carcinomas, 24 small-cell lung carcinomas, and 18 extrapulmonary visceral small-cell carcinomas. SV40 large T antigen immunohistochemistry (clone PAb416) was performed; MCPyV large T antigen immunohistochemistry (clone CM2B4) had been previously performed. UniProt was used to compare the amino acid sequences of the SV40, BK, JC and MCPyV large T antigens, focusing on areas recognised by the PAb416 and CM2B4 clones. SV40 immunohistochemistry was negative in all tumours; MCPyV immunohistochemistry was positive in 38% of Merkel cell carcinomas and in 0% of non-cutaneous poorly differentiated neuroendocrine carcinomas. UniProt analysis revealed a high degree of similarity between SV40, BK, and JC viruses in the region recognised by PAb416. There was less homology between SV40 and MCPyV in this region, which was also interrupted by two long stretches of amino acids unique to MCPyV. The CM2B4 clone recognises a unique epitope in one of these stretches. The PAb416 antibody against the SV40 large T antigen does not cross-react with MCPyV large T antigen, and thus does not label Merkel cell carcinoma. © 2018 John Wiley & Sons Ltd.

  14. Biomineralization of Schlumbergerella floresiana, a significant carbonate-producing benthic foraminifer.

    PubMed

    Sabbatini, A; Bédouet, L; Marie, A; Bartolini, A; Landemarre, L; Weber, M X; Gusti Ngurah Kade Mahardika, I; Berland, S; Zito, F; Vénec-Peyré, M-T

    2014-07-01

    Most foraminifera that produce a shell are efficient biomineralizers. We analyzed the calcitic shell of the large tropical benthic foraminifer Schlumbergerella floresiana. We found a suite of macromolecules containing many charged and polar amino acids and glycine that are also abundant in biomineralization proteins of other phyla. As neither genomic nor transcriptomic data are available for foraminiferal biomineralization yet, de novo-generated sequences, obtained from organic matrices submitted to ms blast database search, led to the characterization of 156 peptides. Very few homologous proteins were matched in the proteomic database, implying that the peptides are derived from unknown proteins present in the foraminiferal organic matrices. The amino acid distribution of these peptides was queried against the uniprot database and the mollusk uniprot database for comparison. The mollusks compose a well-studied phylum that yield a large variety of biomineralization proteins. These results showed that proteins extracted from S. floresiana shells contained sequences enriched with glycine, alanine, and proline, making a set of residues that provided a signature unique to foraminifera. Three of the de novo peptides exhibited sequence similarities to peptides found in proteins such as pre-collagen-P and a group of P-type ATPases including a calcium-transporting ATPase. Surprisingly, the peptide that was most similar to the collagen-like protein was a glycine-rich peptide reported from the test and spine proteome of sea urchin. The molecules, identified by matrix-assisted laser desorption ionization-time of flight mass spectrometry analyses, included acid-soluble N-glycoproteins with its sugar moieties represented by high-mannose-type glycans and carbohydrates. Describing the nature of the proteins, and associated molecules in the skeletal structure of living foraminifera, can elucidate the biomineralization mechanisms of these major carbonate producers in marine ecosystems. As fossil foraminifera provide important paleoenvironmental and paleoclimatic information, a better understanding of biomineralization in these organisms will have far-reaching impacts. © 2014 John Wiley & Sons Ltd.

  15. Different equation-of-motion coupled cluster methods with different reference functions: The formyl radical

    NASA Astrophysics Data System (ADS)

    Kuś, Tomasz; Bartlett, Rodney J.

    2008-09-01

    The doublet and quartet excited states of the formyl radical have been studied by the equation-of-motion (EOM) coupled cluster (CC) method. The Sz spin-conserving singles and doubles (EOM-EE-CCSD) and singles, doubles, and triples (EOM-EE-CCSDT) approaches, as well as the spin-flipped singles and doubles (EOM-SF-CCSD) method have been applied, subject to unrestricted Hartree-Fock (HF), restricted open-shell HF, and quasirestricted HF references. The structural parameters, vertical and adiabatic excitation energies, and harmonic vibrational frequencies have been calculated. The issue of the reference function choice for the spin-flipped (SF) method and its impact on the results has been discussed using the experimental data and theoretical results available. The results show that if the appropriate reference function is chosen so that target states differ from the reference by only single excitations, then EOM-EE-CCSD and EOM-SF-CCSD methods give a very good description of the excited states. For the states that have a non-negligible contribution of the doubly excited configurations one is able to use the SF method with such a reference function, that in most cases the performance of the EOM-SF-CCSD method is better than that of the EOM-EE-CCSD approach.

  16. Thermodynamics of mixtures of patchy and spherical colloids of different sizes: A multi-body association theory with complete reference fluid information.

    PubMed

    Bansal, Artee; Valiya Parambathu, Arjun; Asthagiri, D; Cox, Kenneth R; Chapman, Walter G

    2017-04-28

    We present a theory to predict the structure and thermodynamics of mixtures of colloids of different diameters, building on our earlier work [A. Bansal et al., J. Chem. Phys. 145, 074904 (2016)] that considered mixtures with all particles constrained to have the same size. The patchy, solvent particles have short-range directional interactions, while the solute particles have short-range isotropic interactions. The hard-sphere mixture without any association site forms the reference fluid. An important ingredient within the multi-body association theory is the description of clustering of the reference solvent around the reference solute. Here we account for the physical, multi-body clusters of the reference solvent around the reference solute in terms of occupancy statistics in a defined observation volume. These occupancy probabilities are obtained from enhanced sampling simulations, but we also present statistical mechanical models to estimate these probabilities with limited simulation data. Relative to an approach that describes only up to three-body correlations in the reference, incorporating the complete reference information better predicts the bonding state and thermodynamics of the physical solute for a wide range of system conditions. Importantly, analysis of the residual chemical potential of the infinitely dilute solute from molecular simulation and theory shows that whereas the chemical potential is somewhat insensitive to the description of the structure of the reference fluid, the energetic and entropic contributions are not, with the results from the complete reference approach being in better agreement with particle simulations.

  17. Thermodynamics of mixtures of patchy and spherical colloids of different sizes: A multi-body association theory with complete reference fluid information

    NASA Astrophysics Data System (ADS)

    Bansal, Artee; Valiya Parambathu, Arjun; Asthagiri, D.; Cox, Kenneth R.; Chapman, Walter G.

    2017-04-01

    We present a theory to predict the structure and thermodynamics of mixtures of colloids of different diameters, building on our earlier work [A. Bansal et al., J. Chem. Phys. 145, 074904 (2016)] that considered mixtures with all particles constrained to have the same size. The patchy, solvent particles have short-range directional interactions, while the solute particles have short-range isotropic interactions. The hard-sphere mixture without any association site forms the reference fluid. An important ingredient within the multi-body association theory is the description of clustering of the reference solvent around the reference solute. Here we account for the physical, multi-body clusters of the reference solvent around the reference solute in terms of occupancy statistics in a defined observation volume. These occupancy probabilities are obtained from enhanced sampling simulations, but we also present statistical mechanical models to estimate these probabilities with limited simulation data. Relative to an approach that describes only up to three-body correlations in the reference, incorporating the complete reference information better predicts the bonding state and thermodynamics of the physical solute for a wide range of system conditions. Importantly, analysis of the residual chemical potential of the infinitely dilute solute from molecular simulation and theory shows that whereas the chemical potential is somewhat insensitive to the description of the structure of the reference fluid, the energetic and entropic contributions are not, with the results from the complete reference approach being in better agreement with particle simulations.

  18. A 3D Voronoi+Gapper Galaxy Cluster Finder in Redshift Space to z ∼ 0.2 I: an Algorithm Optimized for the 2dFGRS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pereira, Sebastián; Campusano, Luis E.; Hitschfeld-Kahler, Nancy

    This paper is the first in a series, presenting a new galaxy cluster finder based on a three-dimensional Voronoi Tesselation plus a maximum likelihood estimator, followed by gapping-filtering in radial velocity(VoML+G). The scientific aim of the series is a reassessment of the diversity of optical clusters in the local universe. A mock galaxy database mimicking the southern strip of the magnitude(blue)-limited 2dF Galaxy Redshift Survey (2dFGRS), for the redshift range 0.009 < z < 0.22, is built on the basis of the Millennium Simulation of the LCDM cosmology and a reference catalog of “Millennium clusters,” spannning across the 1.0 ×more » 10{sup 12}–1.0 × 10{sup 15} M {sub ⊙} h {sup −1} dark matter (DM) halo mass range, is recorded. The validation of VoML+G is performed through its application to the mock data and the ensuing determination of the completeness and purity of the cluster detections by comparison with the reference catalog. The execution of VoML+G over the 2dFGRS mock data identified 1614 clusters, 22% with N {sub g} ≥ 10, 64 percent with 10 > N {sub g} ≥ 5, and 14% with N {sub g} < 5. The ensemble of VoML+G clusters has a ∼59% completeness and a ∼66% purity, whereas the subsample with N {sub g} ≥ 10, to z ∼ 0.14, has greatly improved mean rates of ∼75% and ∼90%, respectively. The VoML+G cluster velocity dispersions are found to be compatible with those corresponding to “Millennium clusters” over the 300–1000 km s{sup −1} interval, i.e., for cluster halo masses in excess of ∼3.0 × 10{sup 13} M {sub ⊙} h {sup −1}.« less

  19. VizieR Online Data Catalog: Variable stars in globular clusters (Figuera Jaimes+, 2016)

    NASA Astrophysics Data System (ADS)

    Figuera Jaimes, R.; Bramich, D. M.; Skottfelt, J.; Kains, N.; Jorgensen, U. G.; Horne, K.; Dominik, M.; Alsubai, K. A.; Bozza, V.; Calchi Novati, S.; Ciceri, S.; D'Ago, G.; Galianni, P.; Gu, S.-H.; W Harpsoe, K. B.; Haugbolle, T.; Hinse, T. C.; Hundertmark, M.; Juncher, D.; Korhonen, H.; Mancini, L.; Popovas, A.; Rabus, M.; Rahvar, S.; Scarpetta, G.; Schmidt, R. W.; Snodgrass, C.; Southworth, J.; Starkey, D.; Street, R. A.; Surdej, J.; Wang, X.-B.; Wertz, O.

    2016-02-01

    Observations were taken during 2013 and 2014 as part of an ongoing program at the 1.54m Danish telescope at the ESO observatory at La Silla in Chile that was implemented from April to September each year. table1.dat file contains the time-series I photometry for all the variables in the globular clusters studied in this work. We list standard and instrumental magnitudes and their uncertainties corresponding to the variable star identification, filter, and epoch of mid-exposure. For completeness, we also list the reference flux, difference flux, and photometric scale factor, along with the uncertainties on the reference and difference fluxes. (2 data files).

  20. Isothermality of the gas in the Coma cluster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hughes, J.P.; Yamashita, K.; Okumura, Y.

    1988-04-01

    The high-quality X-ray spectrum of the Coma cluster observed by the Japanese satelite Tenma in conjunction with imaging data from the Einstein Observatory was used to explore the temperature distribution of the cluster gas. It is found that pure polytropic models are inadequate to describe this temperature distribution. Instead, a hybrid model is proposed consisting of a central isothermal region surrounded by a polytropic distribution. It is shown that as much as 75 percent of the global emission may come from the isothermal component. 30 references.

  1. Health-related quality of life in Indian children: A community-based cross-sectional survey

    PubMed Central

    Raj, Manu; Sudhakar, Abish; Roy, Rinku; Champaneri, Bhavik; Joy, Teena Mary; Kumar, Raman Krishna

    2017-01-01

    Background & objectives: There are limited data on health-related quality of life (HRQOL) related to Indian children. The objective of this study was to construct a generic HRQOL reference for children aged 2-18 yr from a community setting. Methods: The study was a community-based cross-sectional survey. A total of 719 children/adolescents in the age group of 2-18 yr were enrolled using stratified random cluster sampling. A total of 40 clusters (cluster size 18) were selected for the study. The data contained child self-report and parent proxy report from healthy children and their parents/caretakers. The Pediatric Quality of Life Inventory 4.0 (PedsQL4.0) Generic Core Scale was used to collect HRQOL data. Questionnaires were self-administered for parents and children aged 8-18 yr. In the age group of five to seven years, parents assisted the children in filling questionnaires. Results: The mean HRQOL total scores from child self-report and parent proxy report were 87.50±11.10 and 90.10±9.50 respectively, for children aged 2-18 yr. Social functioning had the highest scores and emotional functioning had the lowest scores for the entire sample and subgroups. The mean values for HRQOL in the current study were significantly different from the reference study for both child (87.39 vs. 83.91, P<0.001) and parent proxy reports (90.03 vs. 82.29, P<0.001) when compared between children aged 2-16 yr. Interpretation & conclusions: The study provided reference values for HRQOL in healthy children and adolescents from Kerala, India, that appeared to be different from existing international reference. Similar studies need to be done in different parts of India to generate a country-specific HRQOL reference for Indian children. PMID:28862185

  2. Health-related quality of life in Indian children: A community-based cross-sectional survey.

    PubMed

    Raj, Manu; Sudhakar, Abish; Roy, Rinku; Champaneri, Bhavik; Joy, Teena Mary; Kumar, Raman Krishna

    2017-04-01

    There are limited data on health-related quality of life (HRQOL) related to Indian children. The objective of this study was to construct a generic HRQOL reference for children aged 2-18 yr from a community setting. The study was a community-based cross-sectional survey. A total of 719 children/adolescents in the age group of 2-18 yr were enrolled using stratified random cluster sampling. A total of 40 clusters (cluster size 18) were selected for the study. The data contained child self-report and parent proxy report from healthy children and their parents/caretakers. The Pediatric Quality of Life Inventory 4.0 (PedsQL4.0) Generic Core Scale was used to collect HRQOL data. Questionnaires were self-administered for parents and children aged 8-18 yr. In the age group of five to seven years, parents assisted the children in filling questionnaires. The mean HRQOL total scores from child self-report and parent proxy report were 87.50±11.10 and 90.10±9.50 respectively, for children aged 2-18 yr. Social functioning had the highest scores and emotional functioning had the lowest scores for the entire sample and subgroups. The mean values for HRQOL in the current study were significantly different from the reference study for both child (87.39 vs. 83.91, P<0.001) and parent proxy reports (90.03 vs. 82.29, P<0.001) when compared between children aged 2-16 yr. The study provided reference values for HRQOL in healthy children and adolescents from Kerala, India, that appeared to be different from existing international reference. Similar studies need to be done in different parts of India to generate a country-specific HRQOL reference for Indian children.

  3. Progeny Clustering: A Method to Identify Biological Phenotypes

    PubMed Central

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  4. Five task clusters that enable efficient and effective digitization of biological collections

    PubMed Central

    Nelson, Gil; Paul, Deborah; Riccardi, Gregory; Mast, Austin R.

    2012-01-01

    Abstract This paper describes and illustrates five major clusters of related tasks (herein referred to as task clusters) that are common to efficient and effective practices in the digitization of biological specimen data and media. Examples of these clusters come from the observation of diverse digitization processes. The staff of iDigBio (The U.S. National Science Foundation’s National Resource for Advancing Digitization of Biological Collections) visited active biological and paleontological collections digitization programs for the purpose of documenting and assessing current digitization practices and tools. These observations identified five task clusters that comprise the digitization process leading up to data publication: (1) pre-digitization curation and staging, (2) specimen image capture, (3) specimen image processing, (4) electronic data capture, and (5) georeferencing locality descriptions. While not all institutions are completing each of these task clusters for each specimen, these clusters describe a composite picture of digitization of biological and paleontological specimens across the programs that were observed. We describe these clusters, three workflow patterns that dominate the implemention of these clusters, and offer a set of workflow recommendations for digitization programs. PMID:22859876

  5. Prokaryotic Gene Clusters: A Rich Toolbox for Synthetic Biology

    PubMed Central

    Fischbach, Michael; Voigt, Christopher A.

    2014-01-01

    Bacteria construct elaborate nanostructures, obtain nutrients and energy from diverse sources, synthesize complex molecules, and implement signal processing to react to their environment. These complex phenotypes require the coordinated action of multiple genes, which are often encoded in a contiguous region of the genome, referred to as a gene cluster. Gene clusters sometimes contain all of the genes necessary and sufficient for a particular function. As an evolutionary mechanism, gene clusters facilitate the horizontal transfer of the complete function between species. Here, we review recent work on a number of clusters whose functions are relevant to biotechnology. Engineering these clusters has been hindered by their regulatory complexity, the need to balance the expression of many genes, and a lack of tools to design and manipulate DNA at this scale. Advances in synthetic biology will enable the large-scale bottom-up engineering of the clusters to optimize their functions, wake up cryptic clusters, or to transfer them between organisms. Understanding and manipulating gene clusters will move towards an era of genome engineering, where multiple functions can be “mixed-and-matched” to create a designer organism. PMID:21154668

  6. An AK-LDMeans algorithm based on image clustering

    NASA Astrophysics Data System (ADS)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  7. Assistance Services for the Elderly. Reference Book and Student Activity Book.

    ERIC Educational Resources Information Center

    Texas Tech Univ., Lubbock. Home Economics Curriculum Center.

    These coordinated components focus on the career cluster of assistance services for the elderly. The reference book 1987 provide information needed by employees. Each chapter begins with competencies to develop and objectives to achieve. Within the text, bold-faced vocabulary terms are defined. Each chapter concludes with a content summary in the…

  8. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. ExtraTrain: a database of Extragenic regions and Transcriptional information in prokaryotic organisms

    PubMed Central

    Pareja, Eduardo; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Bonal, Javier; Tobes, Raquel

    2006-01-01

    Background Transcriptional regulation processes are the principal mechanisms of adaptation in prokaryotes. In these processes, the regulatory proteins and the regulatory DNA signals located in extragenic regions are the key elements involved. As all extragenic spaces are putative regulatory regions, ExtraTrain covers all extragenic regions of available genomes and regulatory proteins from bacteria and archaea included in the UniProt database. Description ExtraTrain provides integrated and easily manageable information for 679816 extragenic regions and for the genes delimiting each of them. In addition ExtraTrain supplies a tool to explore extragenic regions, named Palinsight, oriented to detect and search palindromic patterns. This interactive visual tool is totally integrated in the database, allowing the search for regulatory signals in user defined sets of extragenic regions. The 26046 regulatory proteins included in ExtraTrain belong to the families AraC/XylS, ArsR, AsnC, Cold shock domain, CRP-FNR, DeoR, GntR, IclR, LacI, LuxR, LysR, MarR, MerR, NtrC/Fis, OmpR and TetR. The database follows the InterPro criteria to define these families. The information about regulators includes manually curated sets of references specifically associated to regulator entries. In order to achieve a sustainable and maintainable knowledge database ExtraTrain is a platform open to the contribution of knowledge by the scientific community providing a system for the incorporation of textual knowledge. Conclusion ExtraTrain is a new database for exploring Extragenic regions and Transcriptional information in bacteria and archaea. ExtraTrain database is available at . PMID:16539733

  10. Properties of coupled-cluster equations originating in excitation sub-algebras

    NASA Astrophysics Data System (ADS)

    Kowalski, Karol

    2018-03-01

    In this paper, we discuss properties of single-reference coupled cluster (CC) equations associated with the existence of sub-algebras of excitations that allow one to represent CC equations in a hybrid fashion where the cluster amplitudes associated with these sub-algebras can be obtained by solving the corresponding eigenvalue problem. For closed-shell formulations analyzed in this paper, the hybrid representation of CC equations provides a natural way for extending active-space and seniority number concepts to provide an accurate description of electron correlation effects. Moreover, a new representation can be utilized to re-define iterative algorithms used to solve CC equations, especially for tough cases defined by the presence of strong static and dynamical correlation effects. We will also explore invariance properties associated with excitation sub-algebras to define a new class of CC approximations referred to in this paper as the sub-algebra-flow-based CC methods. We illustrate the performance of these methods on the example of ground- and excited-state calculations for commonly used small benchmark systems.

  11. A new scheme for perturbative triples correction to (0,1) sector of Fock space multi-reference coupled cluster method: theory, implementation, and examples.

    PubMed

    Dutta, Achintya Kumar; Vaval, Nayana; Pal, Sourav

    2015-01-28

    We propose a new elegant strategy to implement third order triples correction in the light of many-body perturbation theory to the Fock space multi-reference coupled cluster method for the ionization problem. The computational scaling as well as the storage requirement is of key concerns in any many-body calculations. Our proposed approach scales as N(6) does not require the storage of triples amplitudes and gives superior agreement over all the previous attempts made. This approach is capable of calculating multiple roots in a single calculation in contrast to the inclusion of perturbative triples in the equation of motion variant of the coupled cluster theory, where each root needs to be computed in a state-specific way and requires both the left and right state vectors together. The performance of the newly implemented scheme is tested by applying to methylene, boron nitride (B2N) anion, nitrogen, water, carbon monoxide, acetylene, formaldehyde, and thymine monomer, a DNA base.

  12. Dual chain perturbation theory: A new equation of state for polyatomic molecules

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marshall, Bennett D., E-mail: bennett.d.marshall@exxonmobil.com

    In the development of equations of state for polyatomic molecules, thermodynamic perturbation theory (TPT) is widely used to calculate the change in free energy due to chain formation. TPT is a simplification of a more general and exact multi-density cluster expansion for associating fluids. In TPT, all contributions to the cluster expansion which contain chain–chain interactions are neglected. That is, all inter-chain interactions are treated at the reference fluid level. This allows for the summation of the cluster theory in terms of reference system correlation functions only. The resulting theory has been shown to be accurate and has been widelymore » employed as the basis of many engineering equations of state. While highly successful, TPT has many handicaps which result from the neglect of chain–chain contributions. The subject of this document is to move beyond the limitations of TPT and include chain–chain contributions to the equation of state.« less

  13. Fuzzy Document Clustering Approach using WordNet Lexical Categories

    NASA Astrophysics Data System (ADS)

    Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.

    Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.

  14. Structure and thermodynamics of a mixture of patchy and spherical colloids: A multi-body association theory with complete reference fluid information

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bansal, Artee; Asthagiri, D.; Cox, Kenneth R.

    A mixture of solvent particles with short-range, directional interactions and solute particles with short-range, isotropic interactions that can bond multiple times is of fundamental interest in understanding liquids and colloidal mixtures. Because of multi-body correlations, predicting the structure and thermodynamics of such systems remains a challenge. Earlier Marshall and Chapman [J. Chem. Phys. 139, 104904 (2013)] developed a theory wherein association effects due to interactions multiply the partition function for clustering of particles in a reference hard-sphere system. The multi-body effects are incorporated in the clustering process, which in their work was obtained in the absence of the bulk medium.more » The bulk solvent effects were then modeled approximately within a second order perturbation approach. However, their approach is inadequate at high densities and for large association strengths. Based on the idea that the clustering of solvent in a defined coordination volume around the solute is related to occupancy statistics in that defined coordination volume, we develop an approach to incorporate the complete information about hard-sphere clustering in a bulk solvent at the density of interest. The occupancy probabilities are obtained from enhanced sampling simulations but we also develop a concise parametric form to model these probabilities using the quasichemical theory of solutions. We show that incorporating the complete reference information results in an approach that can predict the bonding state and thermodynamics of the colloidal solute for a wide range of system conditions.« less

  15. FPGA based data processing in the ALICE High Level Trigger in LHC Run 2

    NASA Astrophysics Data System (ADS)

    Engel, Heiko; Alt, Torsten; Kebschull, Udo; ALICE Collaboration

    2017-10-01

    The ALICE High Level Trigger (HLT) is a computing cluster dedicated to the online compression, reconstruction and calibration of experimental data. The HLT receives detector data via serial optical links into FPGA based readout boards that process the data on a per-link level already inside the FPGA and provide it to the host machines connected with a data transport framework. FPGA based data pre-processing is enabled for the biggest detector of ALICE, the Time Projection Chamber (TPC), with a hardware cluster finding algorithm. This algorithm was ported to the Common Read-Out Receiver Card (C-RORC) as used in the HLT for RUN 2. It was improved to handle double the input bandwidth and adjusted to the upgraded TPC Readout Control Unit (RCU2). A flexible firmware implementation in the HLT handles both the old and the new TPC data format and link rates transparently. Extended protocol and data error detection, error handling and the enhanced RCU2 data ordering scheme provide an improved physics performance of the cluster finder. The performance of the cluster finder was verified against large sets of reference data both in terms of throughput and algorithmic correctness. Comparisons with a software reference implementation confirm significant savings on CPU processing power using the hardware implementation. The C-RORC hardware with the cluster finder for RCU1 data is in use in the HLT since the start of RUN 2. The extended hardware cluster finder implementation for the RCU2 with doubled throughput is active since the upgrade of the TPC readout electronics in early 2016.

  16. Cluster decomposition of full configuration interaction wave functions: A tool for chemical interpretation of systems with strong correlation

    NASA Astrophysics Data System (ADS)

    Lehtola, Susi; Tubman, Norm M.; Whaley, K. Birgitta; Head-Gordon, Martin

    2017-10-01

    Approximate full configuration interaction (FCI) calculations have recently become tractable for systems of unforeseen size, thanks to stochastic and adaptive approximations to the exponentially scaling FCI problem. The result of an FCI calculation is a weighted set of electronic configurations, which can also be expressed in terms of excitations from a reference configuration. The excitation amplitudes contain information on the complexity of the electronic wave function, but this information is contaminated by contributions from disconnected excitations, i.e., those excitations that are just products of independent lower-level excitations. The unwanted contributions can be removed via a cluster decomposition procedure, making it possible to examine the importance of connected excitations in complicated multireference molecules which are outside the reach of conventional algorithms. We present an implementation of the cluster decomposition analysis and apply it to both true FCI wave functions, as well as wave functions generated from the adaptive sampling CI algorithm. The cluster decomposition is useful for interpreting calculations in chemical studies, as a diagnostic for the convergence of various excitation manifolds, as well as as a guidepost for polynomially scaling electronic structure models. Applications are presented for (i) the double dissociation of water, (ii) the carbon dimer, (iii) the π space of polyacenes, and (iv) the chromium dimer. While the cluster amplitudes exhibit rapid decay with an increasing rank for the first three systems, even connected octuple excitations still appear important in Cr2, suggesting that spin-restricted single-reference coupled-cluster approaches may not be tractable for some problems in transition metal chemistry.

  17. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    PubMed

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  18. Reference Values of Within-District Intraclass Correlations of Academic Achievement by District Characteristics: Results from a Meta-Analysis of District-Specific Values

    ERIC Educational Resources Information Center

    Hedberg, E. C.; Hedges, Larry V.

    2014-01-01

    Randomized experiments are often considered the strongest designs to study the impact of educational interventions. Perhaps the most prevalent class of designs used in large scale education experiments is the cluster randomized design in which entire schools are assigned to treatments. In cluster randomized trials (CRTs) that assign schools to…

  19. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

    PubMed Central

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637

  20. A priori evaluation of two-stage cluster sampling for accuracy assessment of large-area land-cover maps

    USGS Publications Warehouse

    Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L.

    2004-01-01

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, within-cluster correlation may reduce the precision of the accuracy estimates. The detailed population information to quantify a priori the effect of within-cluster correlation on precision is typically unavailable. Consequently, a convenient, practical approach to evaluate the likely performance of a two-stage cluster sample is needed. We describe such an a priori evaluation protocol focusing on the spatial distribution of the sample by land-cover class across different cluster sizes and costs of different sampling options, including options not imposing clustering. This protocol also assesses the two-stage design's adequacy for estimating the precision of accuracy estimates for rare land-cover classes. We illustrate the approach using two large-area, regional accuracy assessments from the National Land-Cover Data (NLCD), and describe how the a priorievaluation was used as a decision-making tool when implementing the NLCD design.

  1. Urban hospital 'clusters' do shift high-risk procedures to key facilities, but more could be done.

    PubMed

    Luke, Roice D; Luke, Tyler; Muller, Nancy

    2011-09-01

    Since the 1990s, rapid consolidation in the hospital sector has resulted in the vast majority of hospitals joining systems that already had a considerable presence within their markets. We refer to these important local and regional systems as "clusters." To determine whether hospital clusters have taken measurable steps aimed at improving the quality of care-specifically, by concentrating low-volume, high-complexity services within selected "lead" facilities-this study examined within-cluster concentrations of high-risk cases for seven surgical procedures. We found that lead hospitals on average performed fairly high percentages of the procedures per cluster, ranging from 59 percent for esophagectomy to 87 percent for aortic valve replacement. The numbers indicate that hospitals might need to work with rival facilities outside their cluster to concentrate cases for the lowest-volume procedures, such as esophagectomies, whereas coordination among cluster members might be sufficient for higher-volume procedures. The results imply that policy makers should focus on clusters' potential for restructuring care and further coordinating services across hospitals in local areas.

  2. Mapping of terrain by computer clustering techniques using multispectral scanner data and using color aerial film

    NASA Technical Reports Server (NTRS)

    Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.

    1972-01-01

    Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.

  3. Large-Angular-Scale Clustering as a Clue to the Source of UHECRs

    NASA Astrophysics Data System (ADS)

    Berlind, Andreas A.; Farrar, Glennys R.

    We explore what can be learned about the sources of UHECRs from their large-angular-scale clustering (referred to as their "bias" by the cosmology community). Exploiting the clustering on large scales has the advantage over small-scale correlations of being insensitive to uncertainties in source direction from magnetic smearing or measurement error. In a Cold Dark Matter cosmology, the amplitude of large-scale clustering depends on the mass of the system, with more massive systems such as galaxy clusters clustering more strongly than less massive systems such as ordinary galaxies or AGN. Therefore, studying the large-scale clustering of UHECRs can help determine a mass scale for their sources, given the assumption that their redshift depth is as expected from the GZK cutoff. We investigate the constraining power of a given UHECR sample as a function of its cutoff energy and number of events. We show that current and future samples should be able to distinguish between the cases of their sources being galaxy clusters, ordinary galaxies, or sources that are uncorrelated with the large-scale structure of the universe.

  4. Update of membership and mean proper motion of open clusters from UCAC5 catalog

    NASA Astrophysics Data System (ADS)

    Dias, W. S.; Monteiro, H.; Assafin, M.

    2018-06-01

    We present mean proper motions and membership probabilities of individual stars for optically visible open clusters, which have been determined using data from the UCAC5 catalog. This follows our previous studies with the UCAC2 and UCAC4 catalogs, but now using improved proper motions in the GAIA reference frame. In the present study results were obtained for a sample of 1108 open clusters. For five clusters, this is the first determination of mean proper motion, and for the whole sample, we present results with a much larger number of identified astrometric member stars than on previous studies. It is the last update of our Open cluster Catalog based on proper motion data only. Future updates will count on astrometric, photometric and spectroscopic GAIA data as input for analyses.

  5. Low-income women's reproductive weight patterns empirically based clusters of prepregnant, gestational, and postpartum weights.

    PubMed

    Walker, Lorraine O

    2009-01-01

    Women have varying weight responses to pregnancy and the postpartum period. The purpose of this study was to derive sub-groups of women based on differing reproductive weight clusters; to validate clusters by reference to adequacy of gestational weight gain (GWG) and postpartum incremental weight shifts; and to examine associations between clusters and demographic, behavioral, and psychosocial variables. A cluster analysis was conducted of a multi-ethnic/racial sample of low-income women (n = 247). Clusters were derived from three weight variables: prepregnant body mass index, GWG, and postpartum retained weight. Five clusters were derived: Cluster 1, normal weight-high prenatal gain-average retain; cluster 2, normal weight-low prenatal gain-zero retain; cluster 3, high normal weight-high prenatal gain-high retain; cluster 4, obese-low prenatal gain-average retain; and cluster 5, overweight-very high prenatal gain-very high retain. Clusters differed with regard to postpartum weight shifts (p < .001), with clusters 3, 4, and 5, mostly gaining weight between 6 weeks and 12 months postpartum, whereas clusters 1 and 2 were losing weight. Clusters were also associated with race/ethnicity (p < .01), breastfeeding immediately postdelivery (p < .01), smoking at 12 months (p < .05), and reaching weight goals at 6 and 12 months (p < .001), but not depressive symptoms, fat intake habits, or physical activity. In a five-cluster solution, postpartum weight shifts, ethnicity, and initial breastfeeding were among factors associated with clusters. Monitoring of weight and appropriate intervention beyond the 6 weeks after birth is needed for low-income women in high normal weight, overweight, and obese clusters.

  6. Genotype imputation in the domestic dog

    PubMed Central

    Meurs, K. M.

    2016-01-01

    Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels. PMID:27129452

  7. Multiconstrained gene clustering based on generalized projections

    PubMed Central

    2010-01-01

    Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386

  8. Approximate solution of coupled cluster equations: application to the coupled cluster doubles method and non-covalent interacting systems.

    PubMed

    Smiga, Szymon; Fabiano, Eduardo

    2017-11-15

    We have developed a simplified coupled cluster (SCC) methodology, using the basic idea of scaled MP2 methods. The scheme has been applied to the coupled cluster double equations and implemented in three different non-iterative variants. This new method (especially the SCCD[3] variant, which utilizes a spin-resolved formalism) has been found to be very efficient and to yield an accurate approximation of the reference CCD results for both total and interaction energies of different atoms and molecules. Furthermore, we demonstrate that the equations determining the scaling coefficients for the SCCD[3] approach can generate non-empirical SCS-MP2 scaling coefficients which are in good agreement with previous theoretical investigations.

  9. Large-scale seismic waveform quality metric calculation using Hadoop

    DOE PAGES

    Magana-Zook, Steven; Gaylord, Jessie M.; Knapp, Douglas R.; ...

    2016-05-27

    Here in this work we investigated the suitability of Hadoop MapReduce and Apache Spark for large-scale computation of seismic waveform quality metrics by comparing their performance with that of a traditional distributed implementation. The Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) provided 43 terabytes of broadband waveform data of which 5.1 TB of data were processed with the traditional architecture, and the full 43 TB were processed using MapReduce and Spark. Maximum performance of ~0.56 terabytes per hour was achieved using all 5 nodes of the traditional implementation. We noted that I/O dominated processing, and that I/Omore » performance was deteriorating with the addition of the 5th node. Data collected from this experiment provided the baseline against which the Hadoop results were compared. Next, we processed the full 43 TB dataset using both MapReduce and Apache Spark on our 18-node Hadoop cluster. We conducted these experiments multiple times with various subsets of the data so that we could build models to predict performance as a function of dataset size. We found that both MapReduce and Spark significantly outperformed the traditional reference implementation. At a dataset size of 5.1 terabytes, both Spark and MapReduce were about 15 times faster than the reference implementation. Furthermore, our performance models predict that for a dataset of 350 terabytes, Spark running on a 100-node cluster would be about 265 times faster than the reference implementation. We do not expect that the reference implementation deployed on a 100-node cluster would perform significantly better than on the 5-node cluster because the I/O performance cannot be made to scale. Finally, we note that although Big Data technologies clearly provide a way to process seismic waveform datasets in a high-performance and scalable manner, the technology is still rapidly changing, requires a high degree of investment in personnel, and will likely require significant changes in other parts of our infrastructure. Nevertheless, we anticipate that as the technology matures and third-party tool vendors make it easier to manage and operate clusters, Hadoop (or a successor) will play a large role in our seismic data processing.« less

  10. Large-scale seismic waveform quality metric calculation using Hadoop

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Magana-Zook, Steven; Gaylord, Jessie M.; Knapp, Douglas R.

    Here in this work we investigated the suitability of Hadoop MapReduce and Apache Spark for large-scale computation of seismic waveform quality metrics by comparing their performance with that of a traditional distributed implementation. The Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) provided 43 terabytes of broadband waveform data of which 5.1 TB of data were processed with the traditional architecture, and the full 43 TB were processed using MapReduce and Spark. Maximum performance of ~0.56 terabytes per hour was achieved using all 5 nodes of the traditional implementation. We noted that I/O dominated processing, and that I/Omore » performance was deteriorating with the addition of the 5th node. Data collected from this experiment provided the baseline against which the Hadoop results were compared. Next, we processed the full 43 TB dataset using both MapReduce and Apache Spark on our 18-node Hadoop cluster. We conducted these experiments multiple times with various subsets of the data so that we could build models to predict performance as a function of dataset size. We found that both MapReduce and Spark significantly outperformed the traditional reference implementation. At a dataset size of 5.1 terabytes, both Spark and MapReduce were about 15 times faster than the reference implementation. Furthermore, our performance models predict that for a dataset of 350 terabytes, Spark running on a 100-node cluster would be about 265 times faster than the reference implementation. We do not expect that the reference implementation deployed on a 100-node cluster would perform significantly better than on the 5-node cluster because the I/O performance cannot be made to scale. Finally, we note that although Big Data technologies clearly provide a way to process seismic waveform datasets in a high-performance and scalable manner, the technology is still rapidly changing, requires a high degree of investment in personnel, and will likely require significant changes in other parts of our infrastructure. Nevertheless, we anticipate that as the technology matures and third-party tool vendors make it easier to manage and operate clusters, Hadoop (or a successor) will play a large role in our seismic data processing.« less

  11. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

    DOE PAGES

    Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.; ...

    2017-06-12

    We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less

  12. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.

    We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less

  13. Recursive Hierarchical Image Segmentation by Region Growing and Constrained Spectral Clustering

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    2002-01-01

    This paper describes an algorithm for hierarchical image segmentation (referred to as HSEG) and its recursive formulation (referred to as RHSEG). The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HS WO) approach to region growing, which seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing. In addition, HSEG optionally interjects between HSWO region growing iterations merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the segmentation results, especially for larger images, it also significantly increases HSEG's computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) has been devised and is described herein. Included in this description is special code that is required to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. Implementations for single processor and for multiple processor computer systems are described. Results with Landsat TM data are included comparing HSEG with classic region growing. Finally, an application to image information mining and knowledge discovery is discussed.

  14. The cluster model of a hot dense vapor

    NASA Astrophysics Data System (ADS)

    Zhukhovitskii, D. I.

    2015-04-01

    We explore thermodynamic properties of a vapor in the range of state parameters where the contribution to thermodynamic functions from bound states of atoms (clusters) dominates over the interaction between the components of the vapor in free states. The clusters are assumed to be light and sufficiently "hot" for the number of bonds to be minimized. We use the technique of calculation of the cluster partition function for the cluster with a minimum number of interatomic bonds to calculate the caloric properties (heat capacity and velocity of sound) for an ideal mixture of the lightest clusters. The problem proves to be exactly solvable and resulting formulas are functions solely of the equilibrium constant of the dimer formation. These formulas ensure a satisfactory correlation with the reference data for the vapors of cesium, mercury, and argon up to moderate densities in both the sub- and supercritical regions. For cesium, we extend the model to the densities close to the critical one by inclusion of the clusters of arbitrary size. Knowledge of the cluster composition of the cesium vapor makes it possible to treat nonequilibrium phenomena such as nucleation of the supersaturated vapor, for which the effect of the cluster structural transition is likely to be significant.

  15. Is patient-grouping on basis of condition on admission indicative for discharge destination in geriatric stroke patients after rehabilitation in skilled nursing facilities? The results of a cluster analysis.

    PubMed

    Buijck, Bianca I; Zuidema, Sytse U; Spruit-van Eijk, Monica; Bor, Hans; Gerritsen, Debby L; Koopmans, Raymond T C M

    2012-12-04

    Geriatric stroke patients are generally frail, have an advanced age and co-morbidity. It is yet unclear whether specific groups of patients might benefit differently from structured multidisciplinary rehabilitation programs. Therefore, the aims of our study are 1) to determine relevant patient characteristics to distinguish groups of patients based on their admission scores in skilled nursing facilities (SNFs), and (2) to study the course of these particular patient-groups in relation to their discharge destination. This is a longitudinal, multicenter, observational study. We collected data on patient characteristics, balance, walking ability, arm function, co-morbidity, activities of daily living (ADL), neuropsychiatric symptoms, and depressive complaints of 127 geriatric stroke patients admitted to skilled nursing facilities with specific units for geriatric rehabilitation after stroke. Cluster analyses revealed two groups: cluster 1 included patients in poor condition upon admission (n = 52), and cluster 2 included patients in fair/good condition upon admission (n = 75). Patients in both groups improved in balance, walking abilities, and arm function. Patients in cluster 1 also improved in ADL. Depressive complaints decreased significantly in patients in cluster 1 who were discharged to an independent- or assisted-living situation. Compared to 80% of the patients in cluster 2, a lower proportion (46%) of the patients in cluster 1 were discharged to an independent- or assisted-living situation. Stroke patients referred for rehabilitation to SNFs could be clustered on the basis of their condition upon admission. Although patients in poor condition on admission were more likely to be referred to a facility for long-term care, this was certainly not the case in all patients. Almost half of them could be discharged to an independent or assisted living situation, which implied that also in patients in poor condition on admission, discharge to an independent or assisted living situation was an attainable goal. It is important to put substantial effort into the rehabilitation of patients in poor condition at admission.

  16. High β effects on cosmic ray streaming in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Wiener, Joshua; Zweibel, Ellen G.; Oh, S. Peng

    2018-01-01

    Diffuse, extended radio emission in galaxy clusters, commonly referred to as radio haloes, indicate the presence of high energy cosmic ray (CR) electrons and cluster-wide magnetic fields. We can predict from theory the expected surface brightness of a radio halo, given magnetic field and CR density profiles. Previous studies have shown that the nature of CR transport can radically effect the expected radio halo emission from clusters (Wiener, Oh & Guo 2013). Reasonable levels of magnetohydrodynamic (MHD) wave damping can lead to significant CR streaming speeds. But a careful treatment of MHD waves in a high β plasma, as expected in cluster environments, reveals damping rates may be enhanced by a factor of β1/2. This leads to faster CR streaming and lower surface brightnesses than without this effect. In this work, we re-examine the simplified, 1D Coma cluster simulations (with radial magnetic fields) of Wiener et al. (2013) and discuss observable consequences of this high β damping. Future work is required to study this effect in more realistic simulations.

  17. Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

    PubMed

    Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

    2017-11-01

    Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  18. fluff: exploratory analysis and visualization of high-throughput sequencing data

    PubMed Central

    Georgiou, Georgios

    2016-01-01

    Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532

  19. Determining open cluster membership. A Bayesian framework for quantitative member classification

    NASA Astrophysics Data System (ADS)

    Stott, Jonathan J.

    2018-01-01

    Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.

  20. Numerical taxonomy and ecology of petroleum-degrading bacteria.

    PubMed Central

    Austin, B; Calomiris, J J; Walker, J D; Colwell, R R

    1977-01-01

    A total of 99 strains of petroleum-degrading bacteria isolated from Chesapeake Bay water and sediment were identified by using numerical taxonomy procedures. The isolates, together with 33 reference cultures, were examined for 48 biochemical, cultural, morphological, and physiological characters. The data were analyzed by computer, using both the simple matching and the Jaccard coefficients. Clustering was achieved by the unweighted average linkage method. From the sorted similarity matrix and dendrogram, 14 phenetic groups, comprising 85 of the petroleum-degrading bacteria, were defined at the 80 to 85% similarity level. These groups were identified as actinomycetes (mycelial forms, four clusters), coryneforms, Enterobacteriaceae, Klebsiella aerogenes, Micrococcus spp. (two clusters), Nocardia species (two clusters), Pseudomonas spp. (two clusters), and Sphaerotilus natans. It is concluded that the degradation of petroleum is accomplished by a diverse range of bacterial taxa, some of which were isolated only at given sampling stations and, more specifically, from sediment collected at a given station. PMID:889329

  1. SSMap: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase.

    PubMed

    David, Fabrice P A; Yip, Yum L

    2008-09-23

    Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap - a new UniProt-PDB residue-residue level mapping - was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps. SSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings. SSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.

  2. Ontological interpretation of biomedical database content.

    PubMed

    Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan

    2017-06-26

    Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.

  3. Reconstruction of the experimentally supported human protein interactome: what can we learn?

    PubMed

    Klapa, Maria I; Tsafou, Kalliopi; Theodoridis, Evangelos; Tsakalidis, Athanasios; Moschonas, Nicholas K

    2013-10-02

    Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. First, we defined the UniProtKB manually reviewed human "complete" proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human "complete" proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.

  4. A two-step initial mass function:. Consequences of clustered star formation for binary properties

    NASA Astrophysics Data System (ADS)

    Durisen, R. H.; Sterzik, M. F.; Pickett, B. K.

    2001-06-01

    If stars originate in transient bound clusters of moderate size, these clusters will decay due to dynamic interactions in which a hard binary forms and ejects most or all the other stars. When the cluster members are chosen at random from a reasonable initial mass function (IMF), the resulting binary characteristics do not match current observations. We find a significant improvement in the trends of binary properties from this scenario when an additional constraint is taken into account, namely that there is a distribution of total cluster masses set by the masses of the cloud cores from which the clusters form. Two distinct steps then determine final stellar masses - the choice of a cluster mass and the formation of the individual stars. We refer to this as a ``two-step'' IMF. Simple statistical arguments are used in this paper to show that a two-step IMF, combined with typical results from dynamic few-body system decay, tends to give better agreement between computed binary characteristics and observations than a one-step mass selection process.

  5. Active galactic nuclei. III - Accretion flow in an externally supplied cluster of black holes

    NASA Technical Reports Server (NTRS)

    Pacholczyk, A. G.; Stoeger, W. R.; Stepinski, T. F.

    1989-01-01

    This third paper in the series modeling QSOs and AGNs as clusters of accreting black holes studies the accretion flow within an externally supplied cluster. Significant radiation will be emitted by the cluster core, but the black holes in the outer halo, where the flow is considered spherically symmetric, will not contribute much to the overall luminosity of the source because of their large velocities relative to the infalling gas and therefore their small accretion radii. As a result, the scenario discussed in Paper I will refer to the cluster cores, rather than to entire clusters. This will steepen the high-frequency region of the spectrum unless inverse Compton scattering is effective. In many cases accretion flow in the central part of the cluster will be optically thick to electron scattering, resulting in a spectrum featuring optically thick radiative component in addition to power-law regimes. The fitting of these spectra to QSO and AGN observations is discussed, and application to 3C 273 is worked out as an example.

  6. Cloning and expression of N-glycosylation-related glucosidase from Glaciozyma antarctica

    NASA Astrophysics Data System (ADS)

    Yajit, Noor Liana Mat; Kamaruddin, Shazilah; Hashim, Noor Haza Fazlin; Bakar, Farah Diba Abu; Murad, Abd. Munir Abd.; Mahadi, Nor Muhammad; Mackeen, Mukram Mohamed

    2016-11-01

    The need for functional oligosaccharides in various field is ever growing. The enzymatic approach for synthesis of oligosaccharides is advantageous over traditional chemical synthesis because of the regio- and stereo- selectivity that can be achieved without the need for protection chemistry. In this study, the α-glucosidase I protein sequence from Saccharomyces cerevisiae (UniProt database) was compared using Basic Local Alignment Search Tool (BLAST) with Glaciozyma antarctica genome database. Results showed 33% identity and an E-value of 1 × 10-125 for α-glucosidase I. The gene was amplified, cloned into the pPICZα C vector and used to transform Pichia pastoris X-33 cells. Soluble expression of α-Glucosidase I (˜91 kDa) was achieved at 28 °C with 1.0 % of methanol.

  7. Unipept web services for metaproteomics analysis.

    PubMed

    Mesuere, Bart; Willems, Toon; Van der Jeugt, Felix; Devreese, Bart; Vandamme, Peter; Dawyndt, Peter

    2016-06-01

    Unipept is an open source web application that is designed for metaproteomics analysis with a focus on interactive datavisualization. It is underpinned by a fast index built from UniProtKB and the NCBI taxonomy that enables quick retrieval of all UniProt entries in which a given tryptic peptide occurs. Unipept version 2.4 introduced web services that provide programmatic access to the metaproteomics analysis features. This enables integration of Unipept functionality in custom applications and data processing pipelines. The web services are freely available at http://api.unipept.ugent.be and are open sourced under the MIT license. Unipept@ugent.be Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. USSR and Eastern Europe Scientific Abstracts, Physics and Mathematics, Number 38

    DTIC Science & Technology

    1977-12-23

    used to optimize the parameters of ultrashort pulse lasers , particularly in the single- pulse mode. Figures 1; references 5: 3 Russian, 2 Western. USSR...reflection of intense laser emission from dense clusters of relativistic electrons is severely re- stricted by fuzziness of the interface for real clusters ...The most widely used method of forming ultrashort pulses of elec- tromagnetic radiation at the present time is self-mode locking by means of

  9. Reference set design for relational modeling of fuzzy systems

    NASA Astrophysics Data System (ADS)

    Lapohos, Tibor; Buchal, Ralph O.

    1994-10-01

    One of the keys to the successful relational modeling of fuzzy systems is the proper design of fuzzy reference sets. This has been discussed throughout the literature. In the frame of modeling a stochastic system, we analyze the problem numerically. First, we briefly describe the relational model and present the performance of the modeling in the most trivial case: the reference sets are triangle shaped. Next, we present a known fuzzy reference set generator algorithm (FRSGA) which is based on the fuzzy c-means (Fc-M) clustering algorithm. In the second section of this chapter we improve the previous FRSGA by adding a constraint to the Fc-M algorithm (modified Fc-M or MFc-M): two cluster centers are forced to coincide with the domain limits. This is needed to obtain properly shaped extreme linguistic reference values. We apply this algorithm to uniformly discretized domains of the variables involved. The fuzziness of the reference sets produced by both Fc-M and MFc-M is determined by a parameter, which in our experiments is modified iteratively. Each time, a new model is created and its performance analyzed. For certain algorithm parameter values both of these two algorithms have shortcomings. To eliminate the drawbacks of these two approaches, we develop a completely new generator algorithm for reference sets which we call Polyline. This algorithm and its performance are described in the last section. In all three cases, the modeling is performed for a variety of operators used in the inference engine and two defuzzification methods. Therefore our results depend neither on the system model order nor the experimental setup.

  10. Bipartite flocking for multi-agent systems

    NASA Astrophysics Data System (ADS)

    Fan, Ming-Can; Zhang, Hai-Tao; Wang, Miaomiao

    2014-09-01

    This paper addresses the bipartite flock control problem where a multi-agent system splits into two clusters upon internal or external excitations. Using structurally balanced signed graph theory, LaSalle's invariance principle and Barbalat's Lemma, we prove that the proposed algorithm guarantees a bipartite flocking behavior. In each of the two disjoint clusters, all individuals move with the same direction. Meanwhile, every pair of agents in different clusters moves with opposite directions. Moreover, all agents in the two separated clusters approach a common velocity magnitude, and collision avoidance among all agents is ensured as well. Finally, the proposed bipartite flock control method is examined by numerical simulations. The bipartite flocking motion addressed by this paper has its references in both natural collective motions and human group behaviors such as predator-prey and panic escaping scenarios.

  11. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  12. A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

    PubMed

    Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

    2009-01-01

    Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.

  13. Clustering algorithm for determining community structure in large networks

    NASA Astrophysics Data System (ADS)

    Pujol, Josep M.; Béjar, Javier; Delgado, Jordi

    2006-07-01

    We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.

  14. Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm

    NASA Astrophysics Data System (ADS)

    Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane

    2017-08-01

    Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.

  15. AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

    PubMed

    Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit

    2012-08-01

    We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.

  16. Genetics Home Reference: fucosidosis

    MedlinePlus

    ... muscle stiffness (spasticity); clusters of enlarged blood vessels forming small, dark red spots on the skin (angiokeratomas); ... link) FUCOSIDOSIS Sources for This Page Ben Turkia H, Tebib N, Azzouz H, Abdelmoula MS, Bouguila J, ...

  17. Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

    PubMed

    Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

    2015-03-01

    Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Two generalizations of Kohonen clustering

    NASA Technical Reports Server (NTRS)

    Bezdek, James C.; Pal, Nikhil R.; Tsao, Eric C. K.

    1993-01-01

    The relationship between the sequential hard c-means (SHCM), learning vector quantization (LVQ), and fuzzy c-means (FCM) clustering algorithms is discussed. LVQ and SHCM suffer from several major problems. For example, they depend heavily on initialization. If the initial values of the cluster centers are outside the convex hull of the input data, such algorithms, even if they terminate, may not produce meaningful results in terms of prototypes for cluster representation. This is due in part to the fact that they update only the winning prototype for every input vector. The impact and interaction of these two families with Kohonen's self-organizing feature mapping (SOFM), which is not a clustering method, but which often leads ideas to clustering algorithms is discussed. Then two generalizations of LVQ that are explicitly designed as clustering algorithms are presented; these algorithms are referred to as generalized LVQ = GLVQ; and fuzzy LVQ = FLVQ. Learning rules are derived to optimize an objective function whose goal is to produce 'good clusters'. GLVQ/FLVQ (may) update every node in the clustering net for each input vector. Neither GLVQ nor FLVQ depends upon a choice for the update neighborhood or learning rate distribution - these are taken care of automatically. Segmentation of a gray tone image is used as a typical application of these algorithms to illustrate the performance of GLVQ/FLVQ.

  19. Night-time neuronal activation of Cluster N in a day- and night-migrating songbird.

    PubMed

    Zapka, Manuela; Heyers, Dominik; Liedvogel, Miriam; Jarvis, Erich D; Mouritsen, Henrik

    2010-08-01

    Magnetic compass orientation in a night-migratory songbird requires that Cluster N, a cluster of forebrain regions, is functional. Cluster N, which receives input from the eyes via the thalamofugal pathway, shows high neuronal activity in night-migrants performing magnetic compass-guided behaviour at night, whereas no activation is observed during the day, and covering up the birds' eyes strongly reduces neuronal activation. These findings suggest that Cluster N processes light-dependent magnetic compass information in night-migrating songbirds. The aim of this study was to test if Cluster N is active during daytime migration. We used behavioural molecular mapping based on ZENK activation to investigate if Cluster N is active in the meadow pipit (Anthus pratensis), a day- and night-migratory species. We found that Cluster N of meadow pipits shows high neuronal activity under dim-light at night, but not under full room-light conditions during the day. These data suggest that, in day- and night-migratory meadow pipits, the light-dependent magnetic compass, which requires an active Cluster N, may only be used during night-time, whereas another magnetosensory mechanism and/or other reference system(s), like the sun or polarized light, may be used as primary orientation cues during the day.

  20. Simulating star clusters with the AMUSE software framework. I. Dependence of cluster lifetimes on model assumptions and cluster dissolution modes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Alfred J.; McMillan, Stephen L. W.; Vesperini, Enrico

    2013-12-01

    We perform a series of simulations of evolving star clusters using the Astrophysical Multipurpose Software Environment (AMUSE), a new community-based multi-physics simulation package, and compare our results to existing work. These simulations model a star cluster beginning with a King model distribution and a selection of power-law initial mass functions and contain a tidal cutoff. They are evolved using collisional stellar dynamics and include mass loss due to stellar evolution. After studying and understanding that the differences between AMUSE results and results from previous studies are understood, we explored the variation in cluster lifetimes due to the random realization noisemore » introduced by transforming a King model to specific initial conditions. This random realization noise can affect the lifetime of a simulated star cluster by up to 30%. Two modes of star cluster dissolution were identified: a mass evolution curve that contains a runaway cluster dissolution with a sudden loss of mass, and a dissolution mode that does not contain this feature. We refer to these dissolution modes as 'dynamical' and 'relaxation' dominated, respectively. For Salpeter-like initial mass functions, we determined the boundary between these two modes in terms of the dynamical and relaxation timescales.« less

  1. The cluster model of a hot dense vapor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhukhovitskii, D. I., E-mail: dmr@ihed.ras.ru

    2015-04-28

    We explore thermodynamic properties of a vapor in the range of state parameters where the contribution to thermodynamic functions from bound states of atoms (clusters) dominates over the interaction between the components of the vapor in free states. The clusters are assumed to be light and sufficiently “hot” for the number of bonds to be minimized. We use the technique of calculation of the cluster partition function for the cluster with a minimum number of interatomic bonds to calculate the caloric properties (heat capacity and velocity of sound) for an ideal mixture of the lightest clusters. The problem proves tomore » be exactly solvable and resulting formulas are functions solely of the equilibrium constant of the dimer formation. These formulas ensure a satisfactory correlation with the reference data for the vapors of cesium, mercury, and argon up to moderate densities in both the sub- and supercritical regions. For cesium, we extend the model to the densities close to the critical one by inclusion of the clusters of arbitrary size. Knowledge of the cluster composition of the cesium vapor makes it possible to treat nonequilibrium phenomena such as nucleation of the supersaturated vapor, for which the effect of the cluster structural transition is likely to be significant.« less

  2. Escherichia coli O-Antigen Gene Clusters of Serogroups O62, O68, O131, O140, O142, and O163: DNA Sequences and Similarity between O62 and O68, and PCR-Based Serogrouping

    PubMed Central

    Liu, Yanhong; Yan, Xianghe; DebRoy, Chitrita; Fratamico, Pina M.; Needleman, David S.; Li, Robert W.; Wang, Wei; Losada, Liliana; Brinkac, Lauren; Radune, Diana; Toro, Magaly; Hegde, Narasimha; Meng, Jianghong

    2015-01-01

    The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined, and primers based on the wzx (O-antigen flippase) and/or wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Specificity was tested with E. coli reference strains, field isolates belonging to the target serogroups, and non-E. coli bacteria. The PCR assays were highly specific for the respective serogroups; however, the PCR assay targeting the O62 wzx gene reacted positively with strains belonging to E. coli O68, which was determined by serotyping. Analysis of the O-antigen gene cluster sequences of serogroups O62 and O68 reference strains showed that they were 94% identical at the nucleotide level, although O62 contained an insertion sequence (IS) element located between the rmlA and rmlC genes within the O-antigen gene cluster. A PCR assay targeting the rmlA and rmlC genes flanking the IS element was used to differentiate O62 and O68 serogroups. The PCR assays developed in this study can be used for the detection and identification of E. coli O62/O68, O131, O140, O142, and O163 strains isolated from different sources. PMID:25664526

  3. VizieR Online Data Catalog: OCCASO survey. HRV for 12 open clusters (Casamiquela+, 2016)

    NASA Astrophysics Data System (ADS)

    Casamiquela, L.; Carrera, R.; Jordi, C.; Balaguer-Nunez, L.; Pancino, E.; Hidalgo, S. L.; Martinez-Vazquez, C. E.; Murabito, S.; Del Pino, A.; Aparicio, A.; Blanco-Cuaresma, S.; Gallart, C.

    2016-05-01

    We present results of radial velocities for stars in 12 completed clusters (77 stars), and the reference stars Arcturus and μ Leo. This is a total of 79 stars. We include radial velocities from individual spectra, and final radial velocities from combined spectra which reach a minimum signal-to-noise ratio of 70. Comparison with the literature is included in the cases which the stars had previous measurements. (2 data files).

  4. NASA Scientific and Technical Publications: A Catalog of Special Publications, Reference Publications, Conference Publications, and Technical Papers 1987-1990

    DTIC Science & Technology

    1991-02-01

    Technical Papers present the results of significant research conducted by NASA scientists and engineers. Presented here are citations for reports from each...CSCL contains photographs of 322 galaxies including the majority of all 03A Shapley-Ames bright galaxies, plus cluster members in the Virgo A...Catalog of Open Clusters and Associated Interstellar Matter Research Council, London, United Kingdom Sponsored by NASA, summarizes observations of 128 open

  5. Construction of ground-state preserving sparse lattice models for predictive materials simulations

    NASA Astrophysics Data System (ADS)

    Huang, Wenxuan; Urban, Alexander; Rong, Ziqin; Ding, Zhiwei; Luo, Chuan; Ceder, Gerbrand

    2017-08-01

    First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states. However, despite recent advances, the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation, since this property is not guaranteed by default. In this paper, we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data. The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters. The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes, i.e., Li2xFe2(1-x)O2 and Li2xTi2(1-x)O2, for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging. We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction, but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement. This method provides a general tool for building robust, compressed and constrained physical models with predictive power.

  6. The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing.

    PubMed

    Kohonen, Pekka; Benfenati, Emilio; Bower, David; Ceder, Rebecca; Crump, Michael; Cross, Kevin; Grafström, Roland C; Healy, Lyn; Helma, Christoph; Jeliazkova, Nina; Jeliazkov, Vedrin; Maggioni, Silvia; Miller, Scott; Myatt, Glenn; Rautenberg, Michael; Stacey, Glyn; Willighagen, Egon; Wiseman, Jeff; Hardy, Barry

    2013-01-01

    The aim of the SEURAT-1 (Safety Evaluation Ultimately Replacing Animal Testing-1) research cluster, comprised of seven EU FP7 Health projects co-financed by Cosmetics Europe, is to generate a proof-of-concept to show how the latest technologies, systems toxicology and toxicogenomics can be combined to deliver a test replacement for repeated dose systemic toxicity testing on animals. The SEURAT-1 strategy is to adopt a mode-of-action framework to describe repeated dose toxicity, combining in vitro and in silico methods to derive predictions of in vivo toxicity responses. ToxBank is the cross-cluster infrastructure project whose activities include the development of a data warehouse to provide a web-accessible shared repository of research data and protocols, a physical compounds repository, reference or "gold compounds" for use across the cluster (available via wiki.toxbank.net), and a reference resource for biomaterials. Core technologies used in the data warehouse include the ISA-Tab universal data exchange format, REpresentational State Transfer (REST) web services, the W3C Resource Description Framework (RDF) and the OpenTox standards. We describe the design of the data warehouse based on cluster requirements, the implementation based on open standards, and finally the underlying concepts and initial results of a data analysis utilizing public data related to the gold compounds. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. The externally corrected coupled cluster approach with four- and five-body clusters from the CASSCF wave function.

    PubMed

    Xu, Enhua; Li, Shuhua

    2015-03-07

    An externally corrected CCSDt (coupled cluster with singles, doubles, and active triples) approach employing four- and five-body clusters from the complete active space self-consistent field (CASSCF) wave function (denoted as ecCCSDt-CASSCF) is presented. The quadruple and quintuple excitation amplitudes within the active space are extracted from the CASSCF wave function and then fed into the CCSDt-like equations, which can be solved in an iterative way as the standard CCSDt equations. With a size-extensive CASSCF reference function, the ecCCSDt-CASSCF method is size-extensive. When the CASSCF wave function is readily available, the computational cost of the ecCCSDt-CASSCF method scales as the popular CCSD method (if the number of active orbitals is small compared to the total number of orbitals). The ecCCSDt-CASSCF approach has been applied to investigate the potential energy surface for the simultaneous dissociation of two O-H bonds in H2O, the equilibrium distances and spectroscopic constants of 4 diatomic molecules (F2(+), O2(+), Be2, and NiC), and the reaction barriers for the automerization reaction of cyclobutadiene and the Cl + O3 → ClO + O2 reaction. In most cases, the ecCCSDt-CASSCF approach can provide better results than the CASPT2 (second order perturbation theory with a CASSCF reference function) and CCSDT methods.

  8. The implementation of hybrid clustering using fuzzy c-means and divisive algorithm for analyzing DNA human Papillomavirus cause of cervical cancer

    NASA Astrophysics Data System (ADS)

    Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.

  9. Radial alignment of elliptical galaxies by the tidal force of a cluster of galaxies

    NASA Astrophysics Data System (ADS)

    Rong, Yu; Yi, Shu-Xu; Zhang, Shuang-Nan; Tu, Hong

    2015-08-01

    Unlike the random radial orientation distribution of field elliptical galaxies, galaxies in a cluster are expected to point preferentially towards the centre of the cluster, as a result of the cluster's tidal force on its member galaxies. In this work, an analytic model is formulated to simulate this effect. The deformation time-scale of a galaxy in a cluster is usually much shorter than the time-scale of change of the tidal force; the dynamical process of tidal interaction within the galaxy can thus be ignored. The equilibrium shape of a galaxy is then assumed to be the surface of equipotential that is the sum of the self-gravitational potential of the galaxy and the tidal potential of the cluster at this location. We use a Monte Carlo method to calculate the radial orientation distribution of cluster galaxies, by assuming a Navarro-Frenk-White mass profile for the cluster and the initial ellipticity of field galaxies. The radial angles show a single-peak distribution centred at zero. The Monte Carlo simulations also show that a shift of the reference centre from the real cluster centre weakens the anisotropy of the radial angle distribution. Therefore, the expected radial alignment cannot be revealed if the distribution of spatial position angle is used instead of that of radial angle. The observed radial orientations of elliptical galaxies in cluster Abell 2744 are consistent with the simulated distribution.

  10. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    PubMed

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  11. Genetics Home Reference: beta-mannosidosis

    MedlinePlus

    ... They may also exhibit distinctive facial features and clusters of enlarged blood vessels forming small, dark red ... JM, Zulaica A, Coll MJ, Chabás A. Molecular analysis in two beta-mannosidosis patients: description of a ...

  12. Cluster-Based Economy Enhancement Act of 2009

    THOMAS, 111th Congress

    Rep. McHugh, John M. [R-NY-23

    2009-02-03

    House - 02/04/2009 Referred to the Subcommittee on Economic Development, Public Buildings and Emergency Management. (All Actions) Tracker: This bill has the status IntroducedHere are the steps for Status of Legislation:

  13. Cluster-Based Economy Enhancement Act of 2010

    THOMAS, 111th Congress

    Rep. Owens, William L. [D-NY-23

    2010-02-23

    House - 02/24/2010 Referred to the Subcommittee on Economic Development, Public Buildings and Emergency Management. (All Actions) Tracker: This bill has the status IntroducedHere are the steps for Status of Legislation:

  14. Acidity in DMSO from the embedded cluster integral equation quantum solvation model.

    PubMed

    Heil, Jochen; Tomazic, Daniel; Egbers, Simon; Kast, Stefan M

    2014-04-01

    The embedded cluster reference interaction site model (EC-RISM) is applied to the prediction of acidity constants of organic molecules in dimethyl sulfoxide (DMSO) solution. EC-RISM is based on a self-consistent treatment of the solute's electronic structure and the solvent's structure by coupling quantum-chemical calculations with three-dimensional (3D) RISM integral equation theory. We compare available DMSO force fields with reference calculations obtained using the polarizable continuum model (PCM). The results are evaluated statistically using two different approaches to eliminating the proton contribution: a linear regression model and an analysis of pK(a) shifts for compound pairs. Suitable levels of theory for the integral equation methodology are benchmarked. The results are further analyzed and illustrated by visualizing solvent site distribution functions and comparing them with an aqueous environment.

  15. Evaluating physical habitat and water chemistry data from statewide stream monitoring programs to establish least-impacted conditions in Washington State

    USGS Publications Warehouse

    Wilmoth, Siri K.; Irvine, Kathryn M.; Larson, Chad

    2015-01-01

    Various GIS-generated land-use predictor variables, physical habitat metrics, and water chemistry variables from 75 reference streams and 351 randomly sampled sites throughout Washington State were evaluated for effectiveness at discriminating reference from random sites within level III ecoregions. A combination of multivariate clustering and ordination techniques were used. We describe average observed conditions for a subset of predictor variables as well as proposing statistical criteria for establishing reference conditions for stream habitat in Washington. Using these criteria, we determined whether any of the random sites met expectations for reference condition and whether any of the established reference sites failed to meet expectations for reference condition. Establishing these criteria will set a benchmark from which future data will be compared.

  16. Zodiacal Exoplanets in Time: Searching for Young Stars in K2

    NASA Astrophysics Data System (ADS)

    Morris, Nathan; Mann, Andrew W.

    2017-06-01

    Nearby young, open clusters such as the Hyades, Pleiades, and Praesepe provide an important reference point for the properties of stellar systems in general. In each cluster, all stars are of the same known age. As such, observations of planetary systems around these stars can be used to gain insight into the early stages of planetary system formation. K2, the revived Kepler mission, has provided a vast number of light curves for young stars in the and elsewhere in the K2 field. We aim to compute rotational periods from sunspot patterns for all K2 target stars and use gyrochronometric relationships derived from cluster stars to determine their ages. From there, we will search for planets around young stars outside the clusters with the ultimate goal of shedding light on how planets and planetary systems evolve with time.

  17. Disintegration of the Aged Open Cluster Berkeley 17

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bhattacharya, Souradeep; Vaidya, Kaushar; Mishra, Ishan

    We present the analysis of the morphological shape of Berkeley 17, the oldest known open cluster (∼10 Gyr), using the probabilistic star counting of Pan-STARRS point sources, and confirm its core-tail shape, plus an antitail, previously detected with the 2MASS data. The stellar population, as diagnosed by the color–magnitude diagram and theoretical isochrones, shows many massive members in the clusters core, whereas there is a paucity of such members in both of the tails. This manifests mass segregation in this aged star cluster with the low-mass members being stripped away from the system. It has been claimed that Berkeley 17more » is associated with an excessive number of blue straggler candidates. A comparison of nearby reference fields indicates that about half of these may be field contamination.« less

  18. Optimal colour quality of LED clusters based on memory colours.

    PubMed

    Smet, Kevin; Ryckaert, Wouter R; Pointer, Michael R; Deconinck, Geert; Hanselaer, Peter

    2011-03-28

    The spectral power distributions of tri- and tetrachromatic clusters of Light-Emitting-Diodes, composed of simulated and commercially available LEDs, were optimized with a genetic algorithm to maximize the luminous efficacy of radiation and the colour quality as assessed by the memory colour quality metric developed by the authors. The trade-off of the colour quality as assessed by the memory colour metric and the luminous efficacy of radiation was investigated by calculating the Pareto optimal front using the NSGA-II genetic algorithm. Optimal peak wavelengths and spectral widths of the LEDs were derived, and over half of them were found to be close to Thornton's prime colours. The Pareto optimal fronts of real LED clusters were always found to be smaller than those of the simulated clusters. The effect of binning on designing a real LED cluster was investigated and was found to be quite large. Finally, a real LED cluster of commercially available AlGaInP, InGaN and phosphor white LEDs was optimized to obtain a higher score on memory colour quality scale than its corresponding CIE reference illuminant.

  19. Newspaper Reporting on a Cluster of Suicides in the UK.

    PubMed

    John, Ann; Hawton, Keith; Gunnell, David; Lloyd, Keith; Scourfield, Jonathan; Jones, Phillip A; Luce, Ann; Marchant, Amanda; Platt, Steve; Price, Sian; Dennis, Michael S

    2017-01-01

    Media reporting may influence suicide clusters through imitation or contagion. In 2008 there was extensive national and international newspaper coverage of a cluster of suicides in young people in the Bridgend area of South Wales, UK. To explore the quantity and quality of newspaper reporting during the identified cluster. Searches were conducted for articles on suicide in Bridgend for 6 months before and after the defined cluster (June 26, 2007, to September 16, 2008). Frequency, quality (using the PRINTQUAL instrument), and sensationalism were examined. In all, 577 newspaper articles were identified. One in seven articles included the suicide method in the headline, 47.3% referred to earlier suicides, and 44% used phrases that guidelines suggest should be avoided. Only 13% included sources of information or advice. A high level of poor-quality and sensationalist reporting was found during an ongoing suicide cluster at the very time when good-quality reporting could be considered important. A broad awareness of media guidelines and expansion and adherence to press codes of practice are required by journalists to ensure ethical reporting.

  20. Characterization of micron-size hydrogen clusters using Mie scattering.

    PubMed

    Jinno, S; Tanaka, H; Matsui, R; Kanasaki, M; Sakaki, H; Kando, M; Kondo, K; Sugiyama, A; Uesaka, M; Kishimoto, Y; Fukuda, Y

    2017-08-07

    Hydrogen clusters with diameters of a few micrometer range, composed of 10 8-10 hydrogen molecules, have been produced for the first time in an expansion of supercooled, high-pressure hydrogen gas into a vacuum through a conical nozzle connected to a cryogenic pulsed solenoid valve. The size distribution of the clusters has been evaluated by measuring the angular distribution of laser light scattered from the clusters. The data were analyzed based on the Mie scattering theory combined with the Tikhonov regularization method including the instrumental functions, the validity of which was assessed by performing a calibration study using a reference target consisting of standard micro-particles with two different sizes. The size distribution of the clusters was found discrete peaked at 0.33 ± 0.03, 0.65 ± 0.05, 0.81 ± 0.06, 1.40 ± 0.06 and 2.00 ± 0.13 µm in diameter. The highly reproducible and impurity-free nature of the micron-size hydrogen clusters can be a promising target for laser-driven multi-MeV proton sources with the currently available high power lasers.

  1. Orbits of Selected Globular Clusters in the Galactic Bulge

    NASA Astrophysics Data System (ADS)

    Pérez-Villegas, A.; Rossi, L.; Ortolani, S.; Casotto, S.; Barbuy, B.; Bica, E.

    2018-05-01

    We present orbit analysis for a sample of eight inner bulge globular clusters, together with one reference halo object. We used proper motion values derived from long time base CCD data. Orbits are integrated in both an axisymmetric model and a model including the Galactic bar potential. The inclusion of the bar proved to be essential for the description of the dynamical behaviour of the clusters. We use the Monte Carlo scheme to construct the initial conditions for each cluster, taking into account the uncertainties in the kinematical data and distances. The sample clusters show typically maximum height to the Galactic plane below 1.5 kpc, and develop rather eccentric orbits. Seven of the bulge sample clusters share the orbital properties of the bar/bulge, having perigalactic and apogalatic distances, and maximum vertical excursion from the Galactic plane inside the bar region. NGC 6540 instead shows a completely different orbital behaviour, having a dynamical signature of the thick disc. Both prograde and prograde-retrograde orbits with respect to the direction of the Galactic rotation were revealed, which might characterise a chaotic behaviour.

  2. Analyzing ZnO clusters through the density-functional theory.

    PubMed

    Zaragoza, Irineo-Pedro; Soriano-Agueda, Luis-Antonio; Hernández-Esparza, Raymundo; Vargas, Rubicelia; Garza, Jorge

    2018-06-16

    The potential energy surface of Zn n O n clusters (n = 2, 4, 6, 8) has been explored by using a simulated annealing method. For n = 2, 4, and 6, the CCSD(T)/TZP method was used as the reference, and from here it is shown that the M06-2X/TZP method gives the lowest deviations over PBE, PBE0, B3LYP, M06, and MP2 methods. Thus, with the M06-2X method we predict isomers of Zn n O n clusters, which coincide with some isomers reported previously. By using the atoms in molecules analysis, possible contacts between Zn and O atoms were found for all structures studied in this article. The bond paths involved in several clusters suggest that Zn n O n clusters can be obtained from the zincite (ZnO crystal), such an observation was confirmed for clusters with n = 2 - 9,18 and 20. The structure with n = 23 was obtained by the procedure presented here, from crystal information, which could be important to confirm experimental data delivered for n = 18 and 23.

  3. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    PubMed

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  4. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    PubMed Central

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  5. Inherent size effects on XANES of nanometer metal clusters: Size-selected platinum clusters on silica

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dai, Yang; Gorey, Timothy J.; Anderson, Scott L.

    2016-12-12

    X-ray absorption near-edge structure (XANES) is commonly used to probe the oxidation state of metal-containing nanomaterials, however, as the particle size in the material drops below a few nanometers, it becomes important to consider inherent size effects on the electronic structure of the materials. In this paper, we analyze a series of size-selected Pt n/SiO 2 samples, using X-ray photoelectron spectroscopy (XPS), low energy ion scattering, grazing-incidence small angle X-ray scattering, and XANES. The oxidation state and morphology are characterized both as-deposited in UHV, and after air/O 2 exposure and annealing in H 2. Here, the clusters are found tomore » be stable during deposition and upon air exposure, but sinter if heated above ~150 °C. XANES shows shifts in the Pt L 3 edge, relative to bulk Pt, that increase with decreasing cluster size, and the cluster samples show high white line intensity. Reference to bulk standards would suggest that the clusters are oxidized, however, XPS shows that they are not. Instead, the XANES effects are attributable to development of a band gap and localization of empty state wavefunctions in small clusters.« less

  6. The distribution of early- and late-type galaxies in the Coma cluster

    NASA Technical Reports Server (NTRS)

    Doi, M.; Fukugita, M.; Okamura, S.; Turner, E. L.

    1995-01-01

    The spatial distribution and the morohology-density relation of Coma cluster galaxies are studied using a new homogeneous photmetric sample of 450 galaxies down to B = 16.0 mag with quantitative morphology classification. The sample covers a wide area (10 deg X 10 deg), extending well beyond the Coma cluster. Morphological classifications into early- (E+SO) and late-(S) type galaxies are made by an automated algorithm using simple photometric parameters, with which the misclassification rate is expected to be approximately 10% with respect to early and late types given in the Third Reference Catalogue of Bright Galaxies. The flattened distribution of Coma cluster galaxies, as noted in previous studies, is most conspicuously seen if the early-type galaxies are selected. Early-type galaxies are distributed in a thick filament extended from the NE to the WSW direction that delineates a part of large-scale structure. Spiral galaxies show a distribution with a modest density gradient toward the cluster center; at least bright spiral galaxies are present close to the center of the Coma cluster. We also examine the morphology-density relation for the Coma cluster including its surrounding regions.

  7. Dispersed or clustered housing for adults with intellectual disability: a systematic review.

    PubMed

    Mansell, Jim; Beadle-Brown, Julie

    2009-12-01

    The purpose of this review was to evaluate the available research on the quality and costs of dispersed community-based housing when compared with clustered housing. Searches against specified criteria yielded 19 papers based on 10 studies presenting data comparing dispersed housing with some kind of clustered housing (village communities, residential campuses, or clusters of houses). The studies reported the experience of nearly 2,500 people from four different countries. In five of eight quality of life domains there were no studies reporting benefits of clustered settings. In respect of interpersonal relations, emotional, and physical well-being, clustered settings had some advantages. However, in many of these cases the better results refer only to village communities and not to campus housing or clustered housing. In terms of costs, clustered housing was usually less expensive because of lower staffing levels. In two of the three studies that examined costs controlling for user characteristics, there was no statistically significant difference. Dispersed housing appears to be superior to clustered housing on the majority of quality indicators studied. The only exception to this is that village communities for people with less severe disabilities have some benefits; this is not, however, a model which can be feasibly provided for everyone. Clustered housing is usually less expensive than dispersed housing but this is because it provides fewer staff hours per person. There is no evidence that clustered housing can deliver the same quality of life as dispersed housing at a lower cost.

  8. Nuclear structure studies performed using the (18O,16O) two-neutron transfer reactions

    NASA Astrophysics Data System (ADS)

    Carbone, D.; Agodi, C.; Cappuzzello, F.; Cavallaro, M.; Ferreira, J. L.; Foti, A.; Gargano, A.; Lenzi, S. M.; Linares, R.; Lubian, J.; Santagati, G.

    2018-02-01

    Excitation energy spectra and absolute cross section angular distributions were measured for the 13C(18O,16O)15C two-neutron transfer reaction at 84 MeV incident energy. This reaction selectively populates two-neutron configurations in the states of the residual nucleus. Exact finite-range coupled reaction channel calculations are used to analyse the data. Two approaches are discussed: the extreme cluster and the newly introduced microscopic cluster. The latter makes use of spectroscopic amplitudes in the centre of mass reference frame, derived from shell-model calculations using the Moshinsky transformation brackets. The results describe well the experimental cross section and highlight cluster configurations in the involved wave functions.

  9. Dynamic Fuzzy Model Development for a Drum-type Boiler-turbine Plant Through GK Clustering

    NASA Astrophysics Data System (ADS)

    Habbi, Ahcène; Zelmat, Mimoun

    2008-10-01

    This paper discusses a TS fuzzy model identification method for an industrial drum-type boiler plant using the GK fuzzy clustering approach. The fuzzy model is constructed from a set of input-output data that covers a wide operating range of the physical plant. The reference data is generated using a complex first-principle-based mathematical model that describes the key dynamical properties of the boiler-turbine dynamics. The proposed fuzzy model is derived by means of fuzzy clustering method with particular attention on structure flexibility and model interpretability issues. This may provide a basement of a new way to design model based control and diagnosis mechanisms for the complex nonlinear plant.

  10. HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads

    PubMed Central

    Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang; Kim, Jihoon; Xiong, Hongkai; Ohno-Machado, Lucila

    2014-01-01

    Background and objective Short-read sequencing is becoming the standard of practice for the study of structural variants associated with disease. However, with the growth of sequence data largely surpassing reasonable storage capability, the biomedical community is challenged with the management, transfer, archiving, and storage of sequence data. Methods We developed Hierarchical mUlti-reference Genome cOmpression (HUGO), a novel compression algorithm for aligned reads in the sorted Sequence Alignment/Map (SAM) format. We first aligned short reads against a reference genome and stored exactly mapped reads for compression. For the inexact mapped or unmapped reads, we realigned them against different reference genomes using an adaptive scheme by gradually shortening the read length. Regarding the base quality value, we offer lossy and lossless compression mechanisms. The lossy compression mechanism for the base quality values uses k-means clustering, where a user can adjust the balance between decompression quality and compression rate. The lossless compression can be produced by setting k (the number of clusters) to the number of different quality values. Results The proposed method produced a compression ratio in the range 0.5–0.65, which corresponds to 35–50% storage savings based on experimental datasets. The proposed approach achieved 15% more storage savings over CRAM and comparable compression ratio with Samcomp (CRAM and Samcomp are two of the state-of-the-art genome compression algorithms). The software is freely available at https://sourceforge.net/projects/hierachicaldnac/with a General Public License (GPL) license. Limitation Our method requires having different reference genomes and prolongs the execution time for additional alignments. Conclusions The proposed multi-reference-based compression algorithm for aligned reads outperforms existing single-reference based algorithms. PMID:24368726

  11. Effect of defuzzification method of fuzzy modeling

    NASA Astrophysics Data System (ADS)

    Lapohos, Tibor; Buchal, Ralph O.

    1994-10-01

    Imprecision can arise in fuzzy relational modeling as a result of fuzzification, inference and defuzzification. These three sources of imprecision are difficult to separate. We have determined through numerical studies that an important source of imprecision is the defuzzification stage. This imprecision adversely affects the quality of the model output. The most widely used defuzzification algorithm is known by the name of `center of area' (COA) or `center of gravity' (COG). In this paper, we show that this algorithm not only maps the near limit values of the variables improperly but also introduces errors for middle domain values of the same variables. Furthermore, the behavior of this algorithm is a function of the shape of the reference sets. We compare the COA method to the weighted average of cluster centers (WACC) procedure in which the transformation is carried out based on the values of the cluster centers belonging to each of the reference membership functions instead of using the functions themselves. We show that this procedure is more effective and computationally much faster than the COA. The method is tested for a family of reference sets satisfying certain constraints, that is, for any support value the sum of reference membership function values equals one and the peak values of the two marginal membership functions project to the boundaries of the universe of discourse. For all the member sets of this family of reference sets the defuzzification errors do not get bigger as the linguistic variables tend to their extreme values. In addition, the more reference sets that are defined for a certain linguistic variable, the less the average defuzzification error becomes. In case of triangle shaped reference sets there is no defuzzification error at all. Finally, an alternative solution is provided that improves the performance of the COA method.

  12. Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery.

    PubMed

    Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo

    2016-08-01

    The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.

  13. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    NASA Astrophysics Data System (ADS)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  14. Numerical taxonomy and ecology of petroleum-degrading bacteria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Austin, B.; Calomiris, J.J.; Walker, J.D.

    1977-07-01

    A total of 99 strains of petroleum-degrading bacteria isolated from Chesapeake Bay water and sediment were identified by using numerical taxonomy procedures. The isolates, together with 33 reference cultures, were examined for 48 biochemical, cultural, morphological, and physiological characters. The data were analyzed by computer, using both the simple matching and the Jaccard coefficients. Clustering was achieved by the unweighted average linkage method. From the sorted similarity matrix and dendrogram, 14 phenetic groups, comprising 85 of the petroleum-degrading bacteria, were defined at the 80 to 85% similarity level. These groups were identified as actinomycetes (mycelial forms, four clusters), coryneforms, Enterobacteriaceae,more » Klebsiella aerogenes, Micrococcus spp. (two clusters), Nocardia species (two clusters), Pseudomonas spp. (two clusters), and Sphaerotilus natans. It is concluded that the degradation of petroleum is accomplished by a diverse range of bacterial taxa, some of which were isolated only at given sampling stations and, more specifically, from sediment collected at a given station.« less

  15. Structure and stability of clusters of β-alanine in the gas phase: importance of the nature of intermolecular interactions.

    PubMed

    Piekarski, Dariusz Grzegorz; Díaz-Tendero, Sergio

    2017-02-15

    We present a theoretical study of neutral clusters of β-alanine molecules in the gas phase, (β-ala) n n ≤ 5. Classical molecular dynamics simulations carried out with different internal excitation energies provide information on the clusters formation and their thermal decomposition limits. We also present an assessment study performed with different families of density functionals using the dimer, (β-ala) 2 , as a benchmark system. The M06-2X functional provides the best agreement in geometries and relative energies in comparison with the reference values computed with the MP2 and CCSD(T) methods. The structure, stability, dissociation energies and vertical ionization potentials of the studied clusters have been investigated using this functional in combination with the 6-311++G(d,p) basis set. An exhaustive analysis of intermolecular interactions is also presented. These results provide new insights into the stability, interaction nature and formation mechanisms of clusters of amino acids in the gas phase.

  16. Determining Distance, Age, and Activity in a New Benchmark Cluster: Ruprecht 147

    NASA Astrophysics Data System (ADS)

    Wright, Jason T.

    2009-08-01

    This proposal seeks 0.7 night of time on Hectochelle to observe the F, G, and K dwarfs of Ruprecht 147, recently identified as the closest old stellar cluster. At only ~ 200 pc and at an age of ~ 1-2 Gyr, this will be an important benchmark in stellar astrophysics, providing the only sample of spectroscopically accessible old, late-type stars of determinable age. Hectochelle is the ideal instrument to study this cluster, with a FOV, fiber count, and telescope aperture well matched to the cluster's diameter (~ 1°), richness (~ 100 identified members), and distance modulus (6.5-7 mag., putting the G and K dwarfs at B=11-15). Hectochelle will measure the Ca II line strengths of members to establish, for the first time, the chromospheric activity levels of a statistically significant sample of single, G and K dwarfs of this modest age. Hectochelle will also vet background stars for suitability as astrometric reference stars for a forthcoming HST FGS proposal to robustly measure the cluster's distance.

  17. Dual beam organic depth profiling using large argon cluster ion beams

    PubMed Central

    Holzweber, M; Shard, AG; Jungnickel, H; Luch, A; Unger, WES

    2014-01-01

    Argon cluster sputtering of an organic multilayer reference material consisting of two organic components, 4,4′-bis[N-(1-naphthyl-1-)-N-phenyl- amino]-biphenyl (NPB) and aluminium tris-(8-hydroxyquinolate) (Alq3), materials commonly used in organic light-emitting diodes industry, was carried out using time-of-flight SIMS in dual beam mode. The sample used in this study consists of a ∽400-nm-thick NPB matrix with 3-nm marker layers of Alq3 at depth of ∽50, 100, 200 and 300 nm. Argon cluster sputtering provides a constant sputter yield throughout the depth profiles, and the sputter yield volumes and depth resolution are presented for Ar-cluster sizes of 630, 820, 1000, 1250 and 1660 atoms at a kinetic energy of 2.5 keV. The effect of cluster size in this material and over this range is shown to be negligible. © 2014 The Authors. Surface and Interface Analysis published by John Wiley & Sons Ltd. PMID:25892830

  18. The organization of prospective thinking: evidence of event clusters in freely generated future thoughts.

    PubMed

    Demblon, Julie; D'Argembeau, Arnaud

    2014-02-01

    Recent research suggests that many imagined future events are not represented in isolation, but instead are embedded in broader event sequences-referred to as event clusters. It remains unclear, however, whether the production of event clusters reflects the underlying organizational structure of prospective thinking or whether it is an artifact of the event-cuing task in which participants are explicitly required to provide chains of associated future events. To address this issue, the present study examined whether the occurrence of event clusters in prospective thought is apparent when people are left to think freely about events that might happen in their personal future. The results showed that the succession of events participants spontaneously produced when envisioning their future frequently included event clusters. This finding provides more compelling evidence that prospective thinking involves higher-order autobiographical knowledge structures that organize imagined events in coherent themes and sequences. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebhaber, S.A.; Weiss, I.; Cash, F.E.

    Synthesis of normal human hemoglobin A, {alpha}{sub 2}{beta}{sub 2}, is based upon balanced expression of genes in the {alpha}-globin gene cluster on chromosome 15 and the {beta}-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the {beta}-globin cluster depend on sequences located at a considerable distance 5{prime} to the {beta}-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the {alpha}-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with {alpha}-thalassemia in whom structurally normal {alpha}-globin genesmore » have been inactivated in cis by a discrete de novo 35-kilobase deletion located {approximately}30 kilobases 5{prime} from the {alpha}-globin gene cluster. They conclude that this deletion inactivates expression of the {alpha}-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the {alpha}-globin genes.« less

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bae, Euiyoung; Bingman, Craig A.; Aceti, David J.

    LOC79017 (MW 21.0 kDa, residues 1-188) was annotated as a hypothetical protein encoded by Homo sapiens chromosome 7 open reading frame 24. It was selected as a target by the Center for Eukaryotic Structural Genomics (CESG) because it did not share more than 30% sequence identity with any protein for which the three-dimensional structure is known. The biological function of the protein has not been established yet. Parts of LOC79017 were identified as members of uncharacterized Pfam families (residues 1-95 as PB006073 and residues 104-180 as PB031696). BLAST searches revealed homologues of LOC79017 in many eukaryotes, but none of themmore » have been functionally characterized. Here, we report the crystal structure of H. sapiens protein LOC79017 (UniGene code Hs.530024, UniProt code O75223, CESG target number go.35223).« less

  1. In silico Analysis of Toxins of Staphylococcus aureus for Validating Putative Drug Targets.

    PubMed

    Mohana, Ramadevi; Venugopal, Subhashree

    2017-01-01

    Toxins are one among the numerous virulence factors produced by the bacteria. These are powerful poisonous substances enabling the bacteria to encounter the defense mechanism of human body. The pathogenic system of Staphylococcus aureus is evolved with various exotoxins that cause detrimental effects on human immune system. Four toxins namely enterotoxin A, exfoliative toxin A, TSST-1 and γ-hemolysin were downloaded from Uniprot database and were analyzed to understand the nature of the toxins and for drug target validation. The results inferred that the toxins were found to interact with many protein partners and no homologous sequences for human proteome were found, and based on similarity search in Drugbank, the targets were identified as novel drug targets. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Towards a comprehensive barcode library for arctic life - Ephemeroptera, Plecoptera, and Trichoptera of Churchill, Manitoba, Canada

    PubMed Central

    2009-01-01

    Background This study reports progress in assembling a DNA barcode reference library for Ephemeroptera, Plecoptera, and Trichoptera ("EPTs") from a Canadian subarctic site, which is the focus of a comprehensive biodiversity inventory using DNA barcoding. These three groups of aquatic insects exhibit a moderate level of species diversity, making them ideal for testing the feasibility of DNA barcoding for routine biotic surveys. We explore the correlation between the morphological species delineations, DNA barcode-based haplotype clusters delimited by a sequence threshold (2%), and a threshold-free approach to biodiversity quantification--phylogenetic diversity. Results A DNA barcode reference library is built for 112 EPT species for the focal region, consisting of 2277 COI sequences. Close correspondence was found between EPT morphospecies and haplotype clusters as designated using a standard threshold value. Similarly, the shapes of taxon accumulation curves based upon haplotype clusters were very similar to those generated using phylogenetic diversity accumulation curves, but were much more computationally efficient. Conclusion The results of this study will facilitate other lines of research on northern EPTs and also bode well for rapidly conducting initial biodiversity assessments in unknown EPT faunas. PMID:20003245

  3. Genetic Characterization of Turkish Snake Melon (Cucumis melo L. subsp. melo flexuosus Group) Accessions Revealed by SSR Markers.

    PubMed

    Solmaz, Ilknur; Kacar, Yildiz Aka; Simsek, Ozhan; Sari, Nebahat

    2016-08-01

    Snake melon is an important cucurbit crop especially in the Southeastern and the Mediterranean region of Turkey. It is consumed as fresh or pickled. The production is mainly done with the local landraces in the country. Turkey is one of the secondary diversification centers of melon and possesses valuable genetic resources which have different morphological characteristics in case of snake melon. Genetic diversity of snake melon genotypes collected from different regions of Turkey and reference genotypes obtained from World Melon Gene Bank in Avignon-France was examined using 13 simple sequence repeat (SSR) markers. A total of 69 alleles were detected, with an average of 5.31 alleles per locus. The polymorphism information content of SSR markers ranged from 0.19 to 0.57 (average 0.38). Based on cluster analysis, two major groups were defined. The first major group included only one accession (61), while the rest of all accessions grouped in the second major group and separated into different sub-clusters. Based on SSR markers, cluster analysis indicated that considerably high genetic variability exists among the examined accessions; however, Turkish snake melon accessions were grouped together with the reference snake melon accessions.

  4. Identification of clusters of individuals relevant to temporomandibular disorders and other chronic pain conditions: the OPPERA study

    PubMed Central

    Bair, Eric; Gaynor, Sheila; Slade, Gary D.; Ohrbach, Richard; Fillingim, Roger B.; Greenspan, Joel D.; Dubner, Ronald; Smith, Shad B.; Diatchenko, Luda; Maixner, William

    2016-01-01

    The classification of most chronic pain disorders gives emphasis to anatomical location of the pain to distinguish one disorder from the other (eg, back pain vs temporomandibular disorder [TMD]) or to define subtypes (eg, TMD myalgia vs arthralgia). However, anatomical criteria overlook etiology, potentially hampering treatment decisions. This study identified clusters of individuals using a comprehensive array of biopsychosocial measures. Data were collected from a case–control study of 1031 chronic TMD cases and 3247 TMD-free controls. Three subgroups were identified using supervised cluster analysis (referred to as the adaptive, pain-sensitive, and global symptoms clusters). Compared with the adaptive cluster, participants in the pain-sensitive cluster showed heightened sensitivity to experimental pain, and participants in the global symptoms cluster showed both greater pain sensitivity and greater psychological distress. Cluster membership was strongly associated with chronic TMD: 91.5% of TMD cases belonged to the pain-sensitive and global symptoms clusters, whereas 41.2% of controls belonged to the adaptive cluster. Temporomandibular disorder cases in the pain-sensitive and global symptoms clusters also showed greater pain intensity, jaw functional limitation, and more comorbid pain conditions. Similar results were obtained when the same methodology was applied to a smaller case–control study consisting of 199 chronic TMD cases and 201 TMD-free controls. During a median 3-year follow-up period of TMD-free individuals, participants in the global symptoms cluster had greater risk of developing first-onset TMD (hazard ratio = 2.8) compared with participants in the other 2 clusters. Cross-cohort predictive modeling was used to demonstrate the reliability of the clusters. PMID:26928952

  5. Finding approximate gene clusters with Gecko 3.

    PubMed

    Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

    2016-11-16

    Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Epidemic Achromobacter xylosoxidans strain among Belgian cystic fibrosis patients and review of literature.

    PubMed

    Cools, Piet; Ho, Erwin; Vranckx, Katleen; Schelstraete, Petra; Wurth, Bettina; Franckx, Hilde; Ieven, Greet; Van Simaey, Leen; Van Daele, Sabine; Verhulst, Stijn; De Baets, Frans; Vaneechoutte, Mario

    2016-06-24

    Achromobacter xylosoxidans is increasingly being recognized as an emerging pathogen in cystic fibrosis. Recent severe infections with A. xylosoxidans in some of our cystic fibrosis (CF) patients led to a re-evaluation of the epidemiology of CF-associated A. xylosoxidans infections in two Belgian reference centres (Antwerp and Ghent). Several of these patients also stayed at the Rehabilitation Centre De Haan (RHC). In total, 59 A. xylosoxidans isolates from 31 patients (including 26 CF patients), collected between 2001 and 2014, were studied. We evaluated Matrix Assisted Laser Desorption Ionisation -Time of Flight mass spectrometry (MALDI-TOF) as an alternative for McRAPD typing. Both typing approaches established the presence of a major cluster, comprising isolates, all from 21 CF patients, including from two patients sampled when staying at the RHC a decade ago. This major cluster was the same as the cluster established already a decade ago at the RHC. A minor cluster consisted of 13 isolates from miscellaneous origin. A further seven isolates, including one from a non-CF patient who had stayed recently at the RHC, were singletons. Typing results of both methods were similar, indicating transmission of a single clone of A. xylosoxidans among several CF patients from at least two reference centres. Isolates of the same clone were already observed at the RHC, a decade ago. It is difficult to establish to what extent the RHC is the source of transmission, because the epidemic strain was already present when the first epidemiological study in the RHC was carried out. This study also documents the applicability of MALDI-TOF for typing of strains within the species A. xylosoxidans and the need to use the dynamic cutoff algorithm of the BioNumerics® software for correct clustering of the fingerprints.

  7. Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets

    PubMed Central

    Aubry, Marc; Monnier, Annabelle; Chicault, Celine; de Tayrac, Marie; Galibert, Marie-Dominique; Burgun, Anita; Mosser, Jean

    2006-01-01

    Background Large-scale genomic studies based on transcriptome technologies provide clusters of genes that need to be functionally annotated. The Gene Ontology (GO) implements a controlled vocabulary organised into three hierarchies: cellular components, molecular functions and biological processes. This terminology allows a coherent and consistent description of the knowledge about gene functions. The GO terms related to genes come primarily from semi-automatic annotations made by trained biologists (annotation based on evidence) or text-mining of the published scientific literature (literature profiling). Results We report an original functional annotation method based on a combination of evidence and literature that overcomes the weaknesses and the limitations of each approach. It relies on the Gene Ontology Annotation database (GOA Human) and the PubGene biomedical literature index. We support these annotations with statistically associated GO terms and retrieve associative relations across the three GO hierarchies to emphasise the major pathways involved by a gene cluster. Both annotation methods and associative relations were quantitatively evaluated with a reference set of 7397 genes and a multi-cluster study of 14 clusters. We also validated the biological appropriateness of our hybrid method with the annotation of a single gene (cdc2) and that of a down-regulated cluster of 37 genes identified by a transcriptome study of an in vitro enterocyte differentiation model (CaCo-2 cells). Conclusion The combination of both approaches is more informative than either separate approach: literature mining can enrich an annotation based only on evidence. Text-mining of the literature can also find valuable associated MEDLINE references that confirm the relevance of the annotation. Eventually, GO terms networks can be built with associative relations in order to highlight cooperative and competitive pathways and their connected molecular functions. PMID:16674810

  8. Sex differences in presenting symptoms of acute coronary syndrome: the EPIHeart cohort study

    PubMed Central

    Laszczyńska, Olga; Viana, Marta; Melão, Filipa; Henriques, Ana; Borges, Andreia; Severo, Milton; Maciel, Maria Júlia; Moreira, Ilídio; Azevedo, Ana

    2018-01-01

    Objectives Prompt diagnosis of acute coronary syndrome (ACS) remains a challenge, with presenting symptoms affecting the diagnosis algorithm and, consequently, management and outcomes. This study aimed to identify sex differences in presenting symptoms of ACS. Design Data were collected within a prospective cohort study (EPIHeart). Setting Patients with confirmed diagnosis of type 1 (primary spontaneous) ACS who were consecutively admitted to the Cardiology Department of two tertiary hospitals in Portugal between August 2013 and December 2014. Participants Presenting symptoms of 873 patients (227 women) were obtained through a face-to-face interview. Outcome measures: Typical pain was defined according to the definition of cardiology societies. Clusters of symptoms other than pain were identified by latent class analysis. Logistic regression was used to quantify differences in presentation of ACS symptoms by sex. Results Chest pain was reported by 82% of patients, with no differences in frequency or location between sexes. Women were more likely to feel pain with an intensity higher than 8/10 and this association was stronger for patients aged under 65 years (interaction P=0.028). Referred pain was also more likely in women, particularly pain referred to typical and atypical locations simultaneously. The multiple symptoms cluster, which was characterised by a high probability of presenting with all symptoms, was almost fourfold more prevalent in women (3.92, 95% CI 2.21 to 6.98). Presentation with this cluster was associated with a higher 30-day mortality rate adjusted for the GRACE V.2.0 risk score (4.9% vs 0.9% for the two other clusters, P<0.001). Conclusions While there are no significant differences in the frequency or location of pain between sexes, women are more likely to feel pain of higher intensity and to present with referred pain and symptoms other than pain. Knowledge of these ACS presentation profiles is important for health policy decisions and clinical practice. PMID:29476027

  9. Suicide Contagion: A Systematic Review of Definitions and Research Utility

    PubMed Central

    Cheng, Qijin; Li, Hong; Silenzio, Vincent; Caine, Eric D.

    2014-01-01

    Objectives Despite the common use of contagion to analogize the spread of suicide, there is a lack of rigorous assessment of the underlying concept or theory supporting the use of this term. The present study aims to examine the varied definitions and potential utility of the term contagion in suicide-related research. Methods 100 initial records and 240 reference records in English were identified as relevant with our research objectives, through systematic literature screening. We then conducted narrative syntheses of various definitions and assessed their potential value for generating new research. Results 20.3% of the 340 records used contagion as equivalent to clustering (contagion-as-cluster); 68.5% used it to refer to various, often related mechanisms underlying the clustering phenomenon (contagion-as-mechanism); and 11.2% without clear definition. Under the category of contagion-as-mechanism, four mechanisms have been proposed to explain how suicide clusters occurred: transmission (contagion-as-transmission), imitation (contagion-as-imitation), contextual influence (contagion-as-context), and affiliation (contagion-as-affiliation). Contagion-as-cluster both confounds and constrains inquiry into suicide clustering by blending proposed mechanism with the phenomenon to be studied. Contagion-as-transmission is, in essence, a double or internally redundant metaphor. Contagion-as-affiliation and contagion-as-context involve mechanisms that are common mechanisms that often occur independently of apparent contagion, or may serve as a facilitating background. When used indiscriminately, these terms may create research blind spots. Contagion-as-imitation combines perspectives from psychology, sociology, and public health research and provides the greatest heuristic utility for examining whether and how suicide and suicidal behaviors may spread among persons at both individual and population levels. Conclusion Clarifying the concept of “suicide contagion” is an essential step for more thoroughly investigating its mechanisms. Developing a clearer understanding of the apparent spread of suicide-promoting influences can, in turn, offer insights necessary to build the scientific foundation for prevention and intervention strategies that can be applied at both individual and community levels. PMID:25259604

  10. (GTG)5-PCR reference framework for acetic acid bacteria.

    PubMed

    Papalexandratou, Zoi; Cleenwerck, Ilse; De Vos, Paul; De Vuyst, Luc

    2009-11-01

    One hundred and fifty-eight strains of acetic acid bacteria (AAB) were subjected to (GTG)(5)-PCR fingerprinting to construct a reference framework for their rapid classification and identification. Most of them clustered according to their respective taxonomic designation; others had to be reclassified based on polyphasic data. This study shows the usefulness of the method to determine the taxonomic and phylogenetic relationships among AAB and to study the AAB diversity of complex ecosystems.

  11. Genetics Home Reference: familial lipoprotein lipase deficiency

    MedlinePlus

    ... 1 millimeter in diameter), but individual xanthomas can cluster together to form larger patches. They are generally ... JC, Méndez-González J, Blanco-Vaca F. Molecular analysis of chylomicronemia in a clinical laboratory setting: diagnosis ...

  12. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    NASA Astrophysics Data System (ADS)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  13. Construction of a map-based reference genome sequence for barley, Hordeum vulgare L.

    PubMed Central

    Beier, Sebastian; Himmelbach, Axel; Colmsee, Christian; Zhang, Xiao-Qi; Barrero, Roberto A.; Zhang, Qisen; Li, Lin; Bayer, Micha; Bolser, Daniel; Taudien, Stefan; Groth, Marco; Felder, Marius; Hastie, Alex; Šimková, Hana; Staňková, Helena; Vrána, Jan; Chan, Saki; Muñoz-Amatriaín, María; Ounit, Rachid; Wanamaker, Steve; Schmutzer, Thomas; Aliyeva-Schnorr, Lala; Grasso, Stefano; Tanskanen, Jaakko; Sampath, Dharanya; Heavens, Darren; Cao, Sujie; Chapman, Brett; Dai, Fei; Han, Yong; Li, Hua; Li, Xuan; Lin, Chongyun; McCooke, John K.; Tan, Cong; Wang, Songbo; Yin, Shuya; Zhou, Gaofeng; Poland, Jesse A.; Bellgard, Matthew I.; Houben, Andreas; Doležel, Jaroslav; Ayling, Sarah; Lonardi, Stefano; Langridge, Peter; Muehlbauer, Gary J.; Kersey, Paul; Clark, Matthew D.; Caccamo, Mario; Schulman, Alan H.; Platzer, Matthias; Close, Timothy J.; Hansson, Mats; Zhang, Guoping; Braumann, Ilka; Li, Chengdao; Waugh, Robbie; Scholz, Uwe; Stein, Nils; Mascher, Martin

    2017-01-01

    Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. ‘Morex’ was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX). PMID:28448065

  14. Novel strategy to implement active-space coupled-cluster methods

    NASA Astrophysics Data System (ADS)

    Rolik, Zoltán; Kállay, Mihály

    2018-03-01

    A new approach is presented for the efficient implementation of coupled-cluster (CC) methods including higher excitations based on a molecular orbital space partitioned into active and inactive orbitals. In the new framework, the string representation of amplitudes and intermediates is used as long as it is beneficial, but the contractions are evaluated as matrix products. Using a new diagrammatic technique, the CC equations are represented in a compact form due to the string notations we introduced. As an application of these ideas, a new automated implementation of the single-reference-based multi-reference CC equations is presented for arbitrary excitation levels. The new program can be considered as an improvement over the previous implementations in many respects; e.g., diagram contributions are evaluated by efficient vectorized subroutines. Timings for test calculations for various complete active-space problems are presented. As an application of the new code, the weak interactions in the Be dimer were studied.

  15. Cluster flight control for fractionated spacecraft on an elliptic orbit

    NASA Astrophysics Data System (ADS)

    Xu, Ming; Liang, Yuying; Tan, Tian; Wei, Lixin

    2016-08-01

    This paper deals with the stabilization of cluster flight on an elliptic reference orbit by the Hamiltonian structure-preserving control using the relative position measurement only. The linearized Melton's relative equation is utilized to derive the controller and then the full nonlinear relative dynamics are employed to numerically evaluate the controller's performance. In this paper, the hyperbolic and elliptic eigenvalues and their manifolds are treated without distinction notations. This new treatment not only contributes to solving the difficulty in feedback of the unfixed-dimensional manifolds, but also allows more opportunities to set the controlled frequencies of foundational motions or to optimize control gains. Any initial condition can be stabilized on a Kolmogorov-Arnold-Moser torus near a controlled elliptic equilibrium. The motions are stabilized around the natural relative trajectories rather than track a reference relative configuration. In addition, the bounded quasi-periodic trajectories generated by the controller have advantages in rapid reconfiguration and unpredictable evolution.

  16. Optimization of a sensor cluster for determination of trajectories and velocities of supersonic objects

    NASA Astrophysics Data System (ADS)

    Cannella, Marco; Sciuto, Salvatore Andrea

    2001-04-01

    An evaluation of errors for a method for determination of trajectories and velocities of supersonic objects is conducted. The analytical study of a cluster, composed of three pressure transducers and generally used as an apparatus for cinematic determination of parameters of supersonic objects, is developed. Furthermore, detailed investigation into the accuracy of this cluster on determination of the slope of an incoming shock wave is carried out for optimization of the device. In particular, a specific non-dimensional parameter is proposed in order to evaluate accuracies for various values of parameters and reference graphs are provided in order to properly design the sensor cluster. Finally, on the basis of the error analysis conducted, a discussion on the best estimation of the relative distance for the sensor as a function of temporal resolution of the measuring system is presented.

  17. Transcriptome deep-sequencing and clustering of expressed isoforms from Favia corals

    PubMed Central

    2013-01-01

    Background Genomic and transcriptomic sequence data are essential tools for tackling ecological problems. Using an approach that combines next-generation sequencing, de novo transcriptome assembly, gene annotation and synthetic gene construction, we identify and cluster the protein families from Favia corals from the northern Red Sea. Results We obtained 80 million 75 bp paired-end cDNA reads from two Favia adult samples collected at 65 m (Fav1, Fav2) on the Illumina GA platform, and generated two de novo assemblies using ABySS and CAP3. After removing redundancy and filtering out low quality reads, our transcriptome datasets contained 58,268 (Fav1) and 62,469 (Fav2) contigs longer than 100 bp, with N50 values of 1,665 bp and 1,439 bp, respectively. Using the proteome of the sea anemone Nematostella vectensis as a reference, we were able to annotate almost 20% of each dataset using reciprocal homology searches. Homologous clustering of these annotated transcripts allowed us to divide them into 7,186 (Fav1) and 6,862 (Fav2) homologous transcript clusters (E-value ≤ 2e-30). Functional annotation categories were assigned to homologous clusters using the functional annotation of Nematostella vectensis. General annotation of the assembled transcripts was improved 1-3% using the Acropora digitifera proteome. In addition, we screened these transcript isoform clusters for fluorescent proteins (FPs) homologs and identified seven potential FP homologs in Fav1, and four in Fav2. These transcripts were validated as bona fide FP transcripts via robust fluorescence heterologous expression. Annotation of the assembled contigs revealed that 1.34% and 1.61% (in Fav1 and Fav2, respectively) of the total assembled contigs likely originated from the corals’ algal symbiont, Symbiodinium spp. Conclusions Here we present a study to identify the homologous transcript isoform clusters from the transcriptome of Favia corals using a far-related reference proteome. Furthermore, the symbiont-derived transcripts were isolated from the datasets and their contribution quantified. This is the first annotated transcriptome of the genus Favia, a major increase in genomics resources available in this important family of corals. PMID:23937070

  18. Microwave-assisted synthesis of water-soluble, fluorescent gold nanoclusters capped with small organic molecules and a revealing fluorescence and X-ray absorption study

    NASA Astrophysics Data System (ADS)

    Helmbrecht, C.; Lützenkirchen-Hecht, D.; Frank, W.

    2015-03-01

    Colourless solutions of blue light-emitting, water-soluble gold nanoclusters (AuNC) were synthesized from gold colloids under microwave irradiation using small organic molecules as ligands. Stabilized by 1,3,5-triaza-7-phosphaadamantane (TPA) or l-glutamine (GLU), fluorescence quantum yields up to 5% were obtained. AuNC are considered to be very promising for biological labelling, optoelectronic devices and light-emitting materials but the structure-property relationships have still not been fully clarified. To expand the knowledge about the AuNC apart from their fluorescent properties they were studied by X-ray absorption spectroscopy elucidating the oxidation state of the nanoclusters' gold atoms. Based on curve fitting of the XANES spectra in comparison to several gold references, optically transparent fluorescent AuNC are predicted to be ligand-stabilized Au5+ species. Additionally, their near edge structure compared with analogous results of polynuclear clusters known from the literature discloses an increasing intensity of the feature close to the absorption edge with decreasing cluster size. As a result, a linear relationship between the cluster size and the X-ray absorption coefficient can be established for the first time.Colourless solutions of blue light-emitting, water-soluble gold nanoclusters (AuNC) were synthesized from gold colloids under microwave irradiation using small organic molecules as ligands. Stabilized by 1,3,5-triaza-7-phosphaadamantane (TPA) or l-glutamine (GLU), fluorescence quantum yields up to 5% were obtained. AuNC are considered to be very promising for biological labelling, optoelectronic devices and light-emitting materials but the structure-property relationships have still not been fully clarified. To expand the knowledge about the AuNC apart from their fluorescent properties they were studied by X-ray absorption spectroscopy elucidating the oxidation state of the nanoclusters' gold atoms. Based on curve fitting of the XANES spectra in comparison to several gold references, optically transparent fluorescent AuNC are predicted to be ligand-stabilized Au5+ species. Additionally, their near edge structure compared with analogous results of polynuclear clusters known from the literature discloses an increasing intensity of the feature close to the absorption edge with decreasing cluster size. As a result, a linear relationship between the cluster size and the X-ray absorption coefficient can be established for the first time. Electronic supplementary information (ESI) available: The deconvoluted reference spectra are given in ESI Fig. 1-9. See DOI: 10.1039/c4nr07051h

  19. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE PAGES

    Lux, Markus; Kruger, Jan; Rinke, Christian; ...

    2016-12-20

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  20. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lux, Markus; Kruger, Jan; Rinke, Christian

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  1. Cluster analysis for determining distribution center location

    NASA Astrophysics Data System (ADS)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  2. Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI

    USGS Publications Warehouse

    Donato, David I.

    2017-01-01

    In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.

  3. A theoretical study of water equilibria: The cluster distribution versus temperature and pressure for (H2O)n, n=1-60, and ice

    NASA Astrophysics Data System (ADS)

    Lenz, Annika; Ojamäe, Lars

    2009-10-01

    The size distribution of water clusters at equilibrium is studied using quantum-chemical calculations in combination with statistical thermodynamics. The necessary energetic data is obtained by quantum-chemical B3LYP computations and through extrapolations from the B3LYP results for the larger clusters. Clusters with up to 60 molecules are included in the equilibrium computations. Populations of different cluster sizes are calculated using both an ideal gas model with noninteracting clusters and a model where a correction for the interaction energy is included analogous to the van der Waals law. In standard vapor the majority of the water molecules are monomers. For the ideal gas model at 1 atm large clusters [56-mer (0-120 K) and 28-mer (100-260 K)] dominate at low temperatures and separate to smaller clusters [21-22-mer (170-280 K) and 4-6-mer (270-320 K) and to monomers (300-350 K)] when the temperature is increased. At lower pressure the transition from clusters to monomers lies at lower temperatures and fewer cluster sizes are formed. The computed size distribution exhibits enhanced peaks for the clusters consisting of 21 and 28 water molecules; these sizes are for protonated water clusters often referred to as magic numbers. If cluster-cluster interactions are included in the model the transition from clusters to monomers is sharper (i.e., occurs over a smaller temperature interval) than when the ideal-gas model is used. Clusters with 20-22 molecules dominate in the liquid region. When a large icelike cluster is included it will dominate for temperatures up to 325 K for the noninteracting clusters model. Thermodynamic properties (Cp, ΔH) were calculated with in general good agreement with experimental values for the solid and gas phase. A formula for the number of H-bond topologies in a given cluster structure is derived. For the 20-mer it is shown that the number of topologies contributes to making the population of dodecahedron-shaped cluster larger than that of a lower-energy fused prism cluster at high temperatures.

  4. A theoretical study of water equilibria: the cluster distribution versus temperature and pressure for (H2O)n, n = 1-60, and ice.

    PubMed

    Lenz, Annika; Ojamäe, Lars

    2009-10-07

    The size distribution of water clusters at equilibrium is studied using quantum-chemical calculations in combination with statistical thermodynamics. The necessary energetic data is obtained by quantum-chemical B3LYP computations and through extrapolations from the B3LYP results for the larger clusters. Clusters with up to 60 molecules are included in the equilibrium computations. Populations of different cluster sizes are calculated using both an ideal gas model with noninteracting clusters and a model where a correction for the interaction energy is included analogous to the van der Waals law. In standard vapor the majority of the water molecules are monomers. For the ideal gas model at 1 atm large clusters [56-mer (0-120 K) and 28-mer (100-260 K)] dominate at low temperatures and separate to smaller clusters [21-22-mer (170-280 K) and 4-6-mer (270-320 K) and to monomers (300-350 K)] when the temperature is increased. At lower pressure the transition from clusters to monomers lies at lower temperatures and fewer cluster sizes are formed. The computed size distribution exhibits enhanced peaks for the clusters consisting of 21 and 28 water molecules; these sizes are for protonated water clusters often referred to as magic numbers. If cluster-cluster interactions are included in the model the transition from clusters to monomers is sharper (i.e., occurs over a smaller temperature interval) than when the ideal-gas model is used. Clusters with 20-22 molecules dominate in the liquid region. When a large icelike cluster is included it will dominate for temperatures up to 325 K for the noninteracting clusters model. Thermodynamic properties (C(p), DeltaH) were calculated with in general good agreement with experimental values for the solid and gas phase. A formula for the number of H-bond topologies in a given cluster structure is derived. For the 20-mer it is shown that the number of topologies contributes to making the population of dodecahedron-shaped cluster larger than that of a lower-energy fused prism cluster at high temperatures.

  5. Genetics Home Reference: congenital afibrinogenemia

    MedlinePlus

    ... Neerman-Arbez M. FGB mutations leading to congenital quantitative fibrinogen deficiencies: an update and report of four ... R, Staeger P, Antonarakis SE, Morris MA. Molecular analysis of the fibrinogen gene cluster in 16 patients with congenital afibrinogenemia: novel truncating ... Support USA. ...

  6. Ligand-protected gold clusters: the structure, synthesis and applications

    NASA Astrophysics Data System (ADS)

    Pichugina, D. A.; Kuz'menko, N. E.; Shestakov, A. F.

    2015-11-01

    Modern concepts of the structure and properties of atomic gold clusters protected by thiolate, selenolate, phosphine and phenylacetylene ligands are analyzed. Within the framework of the superatom theory, the 'divide and protect' approach and the structure rule, the stability and composition of a cluster are determined by the structure of the cluster core, the type of ligands and the total number of valence electrons. Methods of selective synthesis of gold clusters in solution and on the surface of inorganic composites based, in particular, on the reaction of Aun with RS, RSe, PhC≡C, Hal ligands or functional groups of proteins, on stabilization of clusters in cavities of the α-, β and γ-cyclodextrin molecules (Au15 and Au25) and on anchorage to a support surface (Au25/SiO2, Au20/C, Au10/FeOx) are reviewed. Problems in this field are also discussed. Among the methods for cluster structure prediction, particular attention is given to the theoretical approaches based on the density functional theory (DFT). The structures of a number of synthesized clusters are described using the results obtained by X-ray diffraction analysis and DFT calculations. A possible mechanism of formation of the SR(AuSR)n 'staple' units in the cluster shell is proposed. The structure and properties of bimetallic clusters MxAunLm (M=Pd, Pt, Ag, Cu) are discussed. The Pd or Pt atom is located at the centre of the cluster, whereas Ag and Cu atoms form bimetallic compounds in which the heteroatom is located on the surface of the cluster core or in the 'staple' units. The optical properties, fluorescence and luminescence of ligand-protected gold clusters originate from the quantum effects of the Au atoms in the cluster core and in the oligomeric SR(AuSR)x units in the cluster shell. Homogeneous and heterogeneous reactions catalyzed by atomic gold clusters are discussed in the context of the reaction mechanism and the nature of the active sites. The bibliography includes 345 references.

  7. Elements concentration analysis in groundwater from the North Serra Geral aquifer in Santa Helena-Brazil using SR-TXRF spectrometer.

    PubMed

    Justen, Gisele C; Espinoza-Quiñones, Fernando R; Módenes, Aparecido Nivaldo; Bergamasco, Rosangela

    2012-01-01

    In this work the analysis of elements concentration in groundwater was performed using the synchrotron radiation total-reflection X-ray fluorescence (SR-TXRF) technique. A set of nine tube-wells with serious risk of contamination was chosen to monitor the mean concentration of elements in groundwater from the North Serra Geral aquifer in Santa Helena, Brazil, during 1 year. Element concentrations were determined applying a SR-TXRF methodology. The accuracy of SR-TXRF technique was validated by analysis of a certified reference material. As the groundwater composition in the North Serra Geral aquifer showed heterogeneity in the spatial distribution of eight major elements, a hierarchical clustering to the data was performed. By a similarity in their compositions, two of the nine wells were grouped in a first cluster, while the other seven were grouped in a second cluster. Calcium was the major element in all wells, with higher Ca concentration in the second cluster than in the first cluster. However, concentrations of Ti, V, Cr in the first cluster are slightly higher than those in the second cluster. The findings of this study within a monitoring program of tube-wells could provide a useful assessment of controls over groundwater composition and support management at regional level.

  8. Hierarchical clustering of EMD based interest points for road sign detection

    NASA Astrophysics Data System (ADS)

    Khan, Jesmin; Bhuiyan, Sharif; Adhami, Reza

    2014-04-01

    This paper presents an automatic road traffic signs detection and recognition system based on hierarchical clustering of interest points and joint transform correlation. The proposed algorithm consists of the three following stages: interest points detection, clustering of those points and similarity search. At the first stage, good discriminative, rotation and scale invariant interest points are selected from the image edges based on the 1-D empirical mode decomposition (EMD). We propose a two-step unsupervised clustering technique, which is adaptive and based on two criterion. In this context, the detected points are initially clustered based on the stable local features related to the brightness and color, which are extracted using Gabor filter. Then points belonging to each partition are reclustered depending on the dispersion of the points in the initial cluster using position feature. This two-step hierarchical clustering yields the possible candidate road signs or the region of interests (ROIs). Finally, a fringe-adjusted joint transform correlation (JTC) technique is used for matching the unknown signs with the existing known reference road signs stored in the database. The presented framework provides a novel way to detect a road sign from the natural scenes and the results demonstrate the efficacy of the proposed technique, which yields a very low false hit rate.

  9. the-wizz: clustering redshift estimation for everyone

    NASA Astrophysics Data System (ADS)

    Morrison, C. B.; Hildebrandt, H.; Schmidt, S. J.; Baldry, I. K.; Bilicki, M.; Choi, A.; Erben, T.; Schneider, P.

    2017-05-01

    We present the-wizz, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of the-wizz is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an 'expert'. It allows the end user of a given survey to select any subsample of photometric galaxies with unknown redshifts, match this sample's catalogue indices into a value-added data file and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly survey and the Sloan Digital Sky Survey. The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. the-wizz can be downloaded at http://github.com/morriscb/The-wiZZ/.

  10. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    PubMed

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  11. Unique relations between counterfactual thinking and DSM-5 PTSD symptom clusters.

    PubMed

    Mitchell, Melissa A; Contractor, Ateka A; Dranger, Paula; Shea, M Tracie

    2016-05-01

    Cognitive models of posttraumatic stress disorder (PTSD) propose that rumination about a trauma may increase particular symptom clusters. One type of rumination, termed counterfactual thinking (CFT), refers to thinking of alternative outcomes for an event. CFT centered on a trauma is thought to increase intrusions, negative alterations in mood and cognitions (NAMC), and marked alterations in arousal and reactivity (AAR). The theorized relations between CFT and specific symptom clusters have not been thoroughly investigated. Also, past work has not evaluated whether the relation is confounded by depressive symptoms, age, gender, or number of traumatic events experienced. The current study examined the unique associations between CFT and PTSD symptom clusters according to the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2013) in 51 trauma-exposed treatment-seeking individuals. As predicted, CFT was associated with all PTSD symptom clusters. After controlling for common predictors of PTSD symptom severity (i.e., age, depressive symptoms, and number of traumatic life events endorsed), we found CFT to be significantly associated with the intrusion and avoidance symptom clusters but not the AAR or NAMC symptom clusters. Results from the present study provide further support for the role of rumination in specific PTSD symptom clusters above and beyond symptoms of depression, age, and number of traumatic life events endorsed. Future work may consider investigating interventions to reduce rumination in PTSD. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  12. Wettability behavior of water droplet on organic-polluted fused quartz surfaces of pillar-type nanostructures applying molecular dynamics simulation

    NASA Astrophysics Data System (ADS)

    Chen, Jiaxuan; Chen, Wenyang; Xie, Yajing; Wang, Zhiguo; Qin, Jianbo

    2017-02-01

    Molecular dynamics (MD) is applied to research the wettability behaviors of different scale of water clusters absorbed on organic-polluted fused quartz (FQ) surface and different surface structures. The wettability of water clusters is studied under the effect of organic pollutant. With the combined influence of pillar height and interval, the stair-step Wenzel-Cassie transition critical line is obtained by analyzing stable state of water clusters on different surface structures. The results also show that when interval of pillars and the height of pillars keep constant respectively, the changing rules are exactly the opposite and these are termed as the "waterfall" rules. The substrate models of water clusters at Cassie-Baxter state which are at the vicinity of critical line are chosen to analyze the relationship of HI (refers to the pillar height/interval) ratio and scale of water cluster. The study has found that there is a critical changing threshold in the wettability changing process. When the HI ratio keeps constant, the wettability decreases first and then increase as the size of cluster increases; on the contrary, when the size of cluster keeps constant, the wettability decreases and then increase with the decrease of HI ratio, but when the size of water cluster is close to the threshold the HI ratio has little effect on the wettability.

  13. Testing the accuracy of clustering redshifts with simulations

    NASA Astrophysics Data System (ADS)

    Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

    2018-03-01

    We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

  14. Equation-of-motion coupled-cluster method for doubly ionized states with spin-orbit coupling.

    PubMed

    Wang, Zhifan; Hu, Shu; Wang, Fan; Guo, Jingwei

    2015-04-14

    In this work, we report implementation of the equation-of-motion coupled-cluster method for doubly ionized states (EOM-DIP-CC) with spin-orbit coupling (SOC) using a closed-shell reference. Double ionization potentials (DIPs) are calculated in the space spanned by 2h and 3h1p determinants with the EOM-DIP-CC approach at the CC singles and doubles level (CCSD). Time-reversal symmetry together with spatial symmetry is exploited to reduce computational effort. To circumvent the problem of unstable dianion references when diffuse basis functions are included, nuclear charges are scaled. Effect of this stabilization potential on DIPs is estimated based on results from calculations using a small basis set without diffuse basis functions. DIPs and excitation energies of some low-lying states for a series of open-shell atoms and molecules containing heavy elements with two unpaired electrons have been calculated with the EOM-DIP-CCSD approach. Results show that this approach is able to afford a reliable description on SOC splitting. Furthermore, the EOM-DIP-CCSD approach is shown to provide reasonable excitation energies for systems with a dianion reference when diffuse basis functions are not employed.

  15. Equation-of-motion coupled-cluster method for doubly ionized states with spin-orbit coupling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Zhifan; Hu, Shu; Guo, Jingwei

    2015-04-14

    In this work, we report implementation of the equation-of-motion coupled-cluster method for doubly ionized states (EOM-DIP-CC) with spin-orbit coupling (SOC) using a closed-shell reference. Double ionization potentials (DIPs) are calculated in the space spanned by 2h and 3h1p determinants with the EOM-DIP-CC approach at the CC singles and doubles level (CCSD). Time-reversal symmetry together with spatial symmetry is exploited to reduce computational effort. To circumvent the problem of unstable dianion references when diffuse basis functions are included, nuclear charges are scaled. Effect of this stabilization potential on DIPs is estimated based on results from calculations using a small basis setmore » without diffuse basis functions. DIPs and excitation energies of some low-lying states for a series of open-shell atoms and molecules containing heavy elements with two unpaired electrons have been calculated with the EOM-DIP-CCSD approach. Results show that this approach is able to afford a reliable description on SOC splitting. Furthermore, the EOM-DIP-CCSD approach is shown to provide reasonable excitation energies for systems with a dianion reference when diffuse basis functions are not employed.« less

  16. Improving the distinguishable cluster results: spin-component scaling

    NASA Astrophysics Data System (ADS)

    Kats, Daniel

    2018-06-01

    The spin-component scaling is employed in the energy evaluation to improve the distinguishable cluster approach. SCS-DCSD reaction energies reproduce reference values with a root-mean-squared deviation well below 1 kcal/mol, the interaction energies are three to five times more accurate than DCSD, and molecular systems with a large amount of static electron correlation are still described reasonably well. SCS-DCSD represents a pragmatic approach to achieve chemical accuracy with a simple method without triples, which can also be applied to multi-configurational molecular systems.

  17. Density functional Theory Based Generalized Effective Fragment Potential Method (Postprint)

    DTIC Science & Technology

    2014-07-01

    is acceptable for other applications) leads to induced dipole moments within 10−6 to 10−7 au of the precise values . Thus, the applied field of 10−4...noncovalent interactions. The water-benzene clusters17 and WATER2711 reference values were also ob- tained at the CCSD(T)/CBS level, except for the clusters...with n = 20,42 where MP2/CBS was used. The n-alkane dimers18 benchmark values were CCSD(T)/CBS for ethane to butane and a linear extrapolation method

  18. Radial Alignment of Ellipitcal Galaxies by the Tidal Force of a Cluster of Galaxies

    NASA Astrophysics Data System (ADS)

    Zhang, Shuang-Nan; Rong, Yu; Tu, Hong

    2015-08-01

    Unlike the random radial orientation distribution of field elliptical galaxies, galaxies in a cluster of galaxies are expected to point preferentially toward the center of the cluster, as a result of the cluster's tidal force on its member galaxies. In this work an analytic model is formulated to simulate this effect. The deformation time scale of a galaxy in a cluster is usually much shorter than the time scale of change of the tidal force; the dynamical process of the tidal interaction within the galaxy can thus be ignored. An equilibrium shape of a galaxy is then assumed to be the surface of equipotential, which is the sum of the self-gravitational potential of the galaxy and the tidal potential of the cluster at this location. We use a Monte-Carlo method to calculate the radial orientation distribution of these galaxies, by assuming the NFW mass profile of the cluster and the initial ellipticity of field galaxies. The radial angles show a single peak distribution centered at zero. The Monte-Carlo simulations also show that a shift of the reference center from the real cluster center weakens the anisotropy of the radial angle distribution. Therefore, the expected radial alignment cannot be revealed if the distribution of spatial position angle is used instead of that of radial angle. The observed radial orientations of elliptical galaxies in cluster Abell~2744 are consistent with the simulated distribution.

  19. Radial Alignment of Elliptical Galaxies by the Tidal Force of a Cluster of Galaxies

    NASA Astrophysics Data System (ADS)

    Zhang, Shuang-Nan; Rong, Yu; Tu, Hong

    2015-08-01

    Unlike the random radial orientation distribution of field elliptical galaxies, galaxies in a cluster of galaxies are expected to point preferentially toward the center of the cluster, as a result of the cluster's tidal force on its member galaxies. In this work an analytic model is formulated to simulate this effect. The deformation time scale of a galaxy in a cluster is usually much shorter than the time scale of change of the tidal force; the dynamical process of the tidal interaction within the galaxy can thus be ignored. An equilibrium shape of a galaxy is then assumed to be the surface of equipotential, which is the sum of the self-gravitational potential of the galaxy and the tidal potential of the cluster at this location. We use a Monte-Carlo method to calculate the radial orientation distribution of these galaxies, by assuming the NFW mass profile of the cluster and the initial ellipticity of field galaxies. The radial angles show a single peak distribution centered at zero. The Monte-Carlo simulations also show that a shift of the reference center from the real cluster center weakens the anisotropy of the radial angle distribution. Therefore, the expected radial alignment cannot be revealed if the distribution of spatial position angle is used instead of that of radial angle. The observed radial orientations of elliptical galaxies in cluster Abell~2744 are consistent with the simulated distribution.

  20. Reducing Noise in a College Library.

    ERIC Educational Resources Information Center

    Luyben, Paul D.; And Others

    1981-01-01

    Discusses an experiment on controlling library noise by rearrangement of furniture groupings and the separation of existing clusters of furniture. While electromechanical tests showed no significant differences, user measures indicated more acceptable noise levels. There are numerous illustrations and 30 references. (RAA)

  1. Eight Cs and a G.

    ERIC Educational Resources Information Center

    Brown, Dorothy F.

    1988-01-01

    A discussion of vocabulary development for intermediate and advanced students preparing for the Australian certification test for Teaching English as a Foreign Language focuses on nine areas: collocations, clines, clusters, cloze procedures, context, consultation or checking, cards, creativity, and guessing. (seven references) (LB)

  2. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources.

    PubMed

    Rebholz-Schuhmann, Dietrich; Grabmüller, Christoph; Kavaliauskas, Silvestras; Croset, Samuel; Woollard, Peter; Backofen, Rolf; Filsell, Wendy; Clark, Dominic

    2014-07-01

    In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker. Copyright © 2013. Published by Elsevier Ltd.

  3. C-Terminal residues in small potassium channel blockers OdK1 and OSK3 from scorpion venom fine-tune the selectivity.

    PubMed

    Kuzmenkov, Alexey I; Peigneur, Steve; Chugunov, Anton O; Tabakmakher, Valentin M; Efremov, Roman G; Tytgat, Jan; Grishin, Eugene V; Vassilevski, Alexander A

    2017-05-01

    We report isolation, sequencing, and electrophysiological characterization of OSK3 (α-KTx 8.8 in Kalium and Uniprot databases), a potassium channel blocker from the scorpion Orthochirus scrobiculosus venom. Using the voltage clamp technique, OSK3 was tested on a wide panel of 11 voltage-gated potassium channels expressed in Xenopus oocytes, and was found to potently inhibit Kv1.2 and Kv1.3 with IC 50 values of ~331nM and ~503nM, respectively. OdK1 produced by the scorpion Odontobuthus doriae differs by just two C-terminal residues from OSK3, but shows marked preference to Kv1.2. Based on the charybdotoxin-potassium channel complex crystal structure, a model was built to explain the role of the variable residues in OdK1 and OSK3 selectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. From the Superatom Model to a Diverse Array of Super-Elements: A Systematic Study of Dopant Influence on the Electronic Structure of Thiolate-Protected Gold Clusters.

    PubMed

    Schacht, Julia; Gaston, Nicola

    2016-10-18

    The electronic properties of doped thiolate-protected gold clusters are often referred to as tunable, but their study to date, conducted at different levels of theory, does not allow a systematic evaluation of this claim. Here, using density functional theory, the applicability of the superatomic model to these clusters is critically evaluated, and related to the degree of structural distortion and electronic inhomogeneity in the differently doped clusters, with dopant atoms Pd, Pt, Cu, and Ag. The effect of electron number is systematically evaluated by varying the charge on the overall cluster, and the nominal number of delocalized electrons, employed in the superatomic model, is compared to the numbers obtained from Bader analysis of individual atomic charges. We find that the superatomic model is highly applicable to all of these clusters, and is able to predict and explain the changing electronic structure as a function of charge. However, significant perturbations of the model arise due to doping, due to distortions of the core structure of the Au 13 [RS(AuSR) 2 ] 6 - cluster. In addition, analysis of the electronic structure indicates that the superatomic character is distributed further across the ligand shell in the case of the doped clusters, which may have implications for the self-assembly of these clusters into materials. The prediction of appropriate clusters for such superatomic solids relies critically on such quantitative analysis of the tunability of the electronic structure. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. The Integrated Cluster Finder for the ARCHES project

    NASA Astrophysics Data System (ADS)

    Mints, Alexey; Schwope, Axel; Rosen, Simon; Pineau, François-Xavier; Carrera, Francisco

    2017-01-01

    Context. Clusters of galaxies are important for cosmology and astrophysics. They may be discovered through either the summed optical/IR radiation originating from their member galaxies or via X-ray emission originating from the hot intracluster medium. X-ray samples are not affected by projection effects but a redshift determination typically needs optical and infrared follow-up to then infer X-ray temperatures and luminosities. Aims: We want to confirm serendipitously discovered X-ray emitting cluster candidates and measure their cosmological redshift through the analysis and exploration of multi-wavelength photometric catalogues. Methods: We developed a tool, the Integrated Cluster Finder (ICF), to search for clusters by determining overdensities of potential member galaxies in optical and infrared catalogues. Based on a spectroscopic meta-catalogue we calibrated colour-redshift relations that combine optical (SDSS) and IR data (UKIDSS, WISE). The tool is used to quantify the overdensity of galaxies against the background via a modified redMaPPer technique and to quantify the confidence of a cluster detection. Results: Cluster finding results are compared to reference catalogues found in the literature. The results agree to within 95-98%. The tool is used to confirm 488 out of 830 cluster candidates drawn from 3XMMe in the footprint of the SDSS and CFHT catalogues. Conclusions: The ICF is a flexible and highly efficient tool to search for galaxy clusters in multiple catalogues and is freely available to the community. It may be used to identify the cluster content in future X-ray catalogues from XMM-Newton and eventually from eROSITA.

  6. An adaptive clustering algorithm for image matching based on corner feature

    NASA Astrophysics Data System (ADS)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-04-01

    The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.

  7. An inventory of publications on electronic medical records revisited.

    PubMed

    Moorman, P W; Schuemie, M J; van der Lei, J

    2009-01-01

    In this short review we provide an update of our earlier inventories of publications indexed in MedLine with the MeSH term 'Medical Records Systems, Computerized'. We retrieved and analyzed all references to English articles published before January 1, 2008, and indexed in PubMed with the MeSH term 'Medical Records Systems, Computerized'. We retrieved a total of 11,924 publications, of which 3937 (33%) appeared in a journal with an impact factor. Since 2002 the number of yearly publications, and the number of journals in which those publications appeared, increased. A cluster analysis revealed three clusters: an organizational issues cluster, a technically oriented cluster and a cluster about order-entry and research. Although our previous inventory in 2003 suggested a constant yearly production of publications on electronic medical records since 1998, the current inventory shows another rise in production since 2002. In addition, many new journals and countries have shown interest during the last five years. In the last 15 years, interest in organizational issues remained fairly constant, order entry and research with systems gained attention, while interest in technical issues relatively decreased.

  8. Interacting star clusters in the Large Magellanic Cloud. Overmerging problem solved by cluster group formation

    NASA Astrophysics Data System (ADS)

    Leon, Stéphane; Bergond, Gilles; Vallenari, Antonella

    1999-04-01

    We present the tidal tail distributions of a sample of candidate binary clusters located in the bar of the Large Magellanic Cloud (LMC). One isolated cluster, SL 268, is presented in order to study the effect of the LMC tidal field. All the candidate binary clusters show tidal tails, confirming that the pairs are formed by physically linked objects. The stellar mass in the tails covers a large range, from 1.8x 10(3) to 3x 10(4) \\msun. We derive a total mass estimate for SL 268 and SL 356. At large radii, the projected density profiles of SL 268 and SL 356 fall off as r(-gamma ) , with gamma = 2.27 and gamma =3.44, respectively. Out of 4 pairs or multiple systems, 2 are older than the theoretical survival time of binary clusters (going from a few 10(6) years to 10(8) years). A pair shows too large age difference between the components to be consistent with classical theoretical models of binary cluster formation (Fujimoto & Kumai \\cite{fujimoto97}). We refer to this as the ``overmerging'' problem. A different scenario is proposed: the formation proceeds in large molecular complexes giving birth to groups of clusters over a few 10(7) years. In these groups the expected cluster encounter rate is larger, and tidal capture has higher probability. Cluster pairs are not born together through the splitting of the parent cloud, but formed later by tidal capture. For 3 pairs, we tentatively identify the star cluster group (SCG) memberships. The SCG formation, through the recent cluster starburst triggered by the LMC-SMC encounter, in contrast with the quiescent open cluster formation in the Milky Way can be an explanation to the paucity of binary clusters observed in our Galaxy. Based on observations collected at the European Southern Observatory, La Silla, Chile}

  9. VizieR Online Data Catalog: Monoceros star-forming region radial velocities (Costado+ 2018)

    NASA Astrophysics Data System (ADS)

    Costado, M. T.; Alfaro, E. J.

    2018-03-01

    The compiled data of each star. The columns RV + catalogue number are taken from VizieR (references below) and the column WEBDA is the median value calculated using all measurements taken from WEBDA database (references below for each cluster). We also show our identification number (ID), HD and HIP number, the Equatorial coordinates, the RV median value, which we will use in the kinematic analysis, and the distance calculated by Astraatmadja & Bailer-Jones (2017) using Gaia parallax. (1 data file).

  10. Weak Lensing Peaks in Simulated Light-Cones: Investigating the Coupling between Dark Matter and Dark Energy

    NASA Astrophysics Data System (ADS)

    Giocoli, Carlo; Moscardini, Lauro; Baldi, Marco; Meneghetti, Massimo; Metcalf, Robert B.

    2018-05-01

    In this paper, we study the statistical properties of weak lensing peaks in light-cones generated from cosmological simulations. In order to assess the prospects of such observable as a cosmological probe, we consider simulations that include interacting Dark Energy (hereafter DE) models with coupling term between DE and Dark Matter. Cosmological models that produce a larger population of massive clusters have more numerous high signal-to-noise peaks; among models with comparable numbers of clusters those with more concentrated haloes produce more peaks. The most extreme model under investigation shows a difference in peak counts of about 20% with respect to the reference ΛCDM model. We find that peak statistics can be used to distinguish a coupling DE model from a reference one with the same power spectrum normalisation. The differences in the expansion history and the growth rate of structure formation are reflected in their halo counts, non-linear scale features and, through them, in the properties of the lensing peaks. For a source redshift distribution consistent with the expectations of future space-based wide field surveys, we find that typically seventy percent of the cluster population contributes to weak-lensing peaks with signal-to-noise ratios larger than two, and that the fraction of clusters in peaks approaches one-hundred percent for haloes with redshift z ≤ 0.5. Our analysis demonstrates that peak statistics are an important tool for disentangling DE models by accurately tracing the structure formation processes as a function of the cosmic time.

  11. RNA-Seq Analysis Using De Novo Transcriptome Assembly as a Reference for the Salmon Louse Caligus rogercresseyi

    PubMed Central

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions. PMID:24691066

  12. RNA-Seq analysis using de novo transcriptome assembly as a reference for the salmon louse Caligus rogercresseyi.

    PubMed

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I-II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.

  13. VizieR Online Data Catalog: Spectroscopy of globular clusters (Larsen+, 2018)

    NASA Astrophysics Data System (ADS)

    Larsen, S. S.; Brodie, J. P.; Wasserman, A.; Strader, J.

    2018-01-01

    New observations of globular clusters in NGC 147 and NGC 6822 were obtained with the HIRES spectrograph on the Keck I telescope on 5 Oct 2015 and 25 Sep 2016. We also include older HIRES observations of four GCs in M33. The spectra are the same as those used by Larsen et al. (2002AJ....124.2615L). In addition to the HIRES observations, we include our previously published VLT/UVES spectra of GCs in the Fornax and WLM galaxies (Larsen et al. 2012A&A...546A..53L, 2014A&A...565A..98L) and we refer to our previous papers for details on the observational strategy and data reduction. These tables contain the individual abundance measurements for each cluster. (16 data files).

  14. Variations in Metallicity and Gas Content in Spiral Galaxies: Accidents of Infall

    NASA Astrophysics Data System (ADS)

    Shields, Gregory A.; Robertson, P.; Dave, R.; Blanc, G. A.; Wright, A.

    2013-01-01

    Oxygen abundances are elevated in hydrogen deficient spirals in the Virgo and Pegasus clusters (Robertson et al. 2012, ApJ 748:48, and references therein). We confirm the relationship between O/H and H I deficiency "DEF" for an additional set of cluster spirals. In addition, we find that field spirals show a similar increase in O/H with DEF. Thus, the relationship is not uniquely the result of environmental processes in clusters. Cosmological simulations of galaxy formation predict a qualitatively similar trend of O/H with DEF for field spirals. This reflects excursions of gas content and metallicity above and below the mean mass-metallicity relationship as galaxies evolve. These excursions result from the stochastic effects of mergers and merger-free periods during the evolution.

  15. Data Handling and Communication

    NASA Astrophysics Data System (ADS)

    Hemmer, FréDéRic Giorgio Innocenti, Pier

    The following sections are included: * Introduction * Computing Clusters and Data Storage: The New Factory and Warehouse * Local Area Networks: Organizing Interconnection * High-Speed Worldwide Networking: Accelerating Protocols * Detector Simulation: Events Before the Event * Data Analysis and Programming Environment: Distilling Information * World Wide Web: Global Networking * References

  16. Freshman Health Topics

    ERIC Educational Resources Information Center

    Hovde, Karen

    2011-01-01

    This article examines a cluster of health topics that are frequently selected by students in lower division classes. Topics address issues relating to addictive substances, including alcohol and tobacco, eating disorders, obesity, and dieting. Analysis of the topics examines their interrelationships and organization in the reference literature.…

  17. MicroRNA Expression in Formalin-fixed Paraffin-embedded Cancer Tissue: Identifying Reference MicroRNAs and Variability.

    PubMed

    Boisen, Mogens Karsbøl; Dehlendorff, Christian; Linnemann, Dorte; Schultz, Nicolai Aagaard; Jensen, Benny Vittrup; Høgdall, Estrid Vilma Solyom; Johansen, Julia Sidenius

    2015-12-29

    Archival formalin-fixed paraffin-embedded (FFPE) cancer tissue samples are a readily available resource for microRNA (miRNA) biomarker identification. No established standard for reference miRNAs in FFPE tissue exists. We sought to identify stable reference miRNAs for normalization of miRNA expression in FFPE tissue samples from patients with colorectal (CRC) and pancreatic (PC) cancer and to quantify the variability associated with sample age and fixation. High-throughput miRNA profiling results from 203 CRC and 256 PC FFPE samples as well as from 37 paired frozen/FFPE samples from nine other CRC tumors (methodological samples) were used. Candidate reference miRNAs were identified by their correlation with global mean expression. The stability of reference genes was analyzed according to published methods. The association between sample age and global mean miRNA expression was tested using linear regression. Variability was described using correlation coefficients and linear mixed effects models. Normalization effects were determined by changes in standard deviation and by hierarchical clustering. We created lists of 20 miRNAs with the best correlation to global mean expression in each cancer type. Nine of these miRNAs were present in both lists, and miR-103a-3p was the most stable reference miRNA for both CRC and PC FFPE tissue. The optimal number of reference miRNAs was 4 in CRC and 10 in PC. Sample age had a significant effect on global miRNA expression in PC (50% reduction over 20 years) but not in CRC. Formalin fixation for 2-6 days decreased miRNA expression 30-65%. Normalization using global mean expression reduced variability for technical and biological replicates while normalization using the expression of the identified reference miRNAs reduced variability only for biological replicates. Normalization only had a minor impact on clustering results. We identified suitable reference miRNAs for future miRNA expression experiments using CRC- and PC FFPE tissue samples. Formalin fixation decreased miRNA expression considerably, while the effect of increasing sample age was estimated to be negligible in a clinical setting.

  18. Reconstruction of the experimentally supported human protein interactome: what can we learn?

    PubMed Central

    2013-01-01

    Background Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. Results First, we defined the UniProtKB manually reviewed human “complete” proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. Conclusions Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human “complete” proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms. PMID:24088582

  19. DNA Barcode Reference Library for the African Citrus Triozid, Trioza erytreae (Hemiptera: Triozidae): Vector of African Citrus Greening.

    PubMed

    Khamis, F M; Rwomushana, I; Ombura, L O; Cook, G; Mohamed, S A; Tanga, C M; Nderitu, P W; Borgemeister, C; Sétamou, M; Grout, T G; Ekesi, S

    2017-12-05

    Citrus (Citrus spp.) production continues to decline in East Africa, particularly in Kenya and Tanzania, the two major producers in the region. This decline is attributed to pests and diseases including infestation by the African citrus triozid, Trioza erytreae (Del Guercio) (Hemiptera: Triozidae). Besides direct feeding damage by adults and immature stages, T. erytreae is the main vector of 'Candidatus Liberibacter africanus', the causative agent of Greening disease in Africa, closely related to Huanglongbing. This study aimed to generate a novel barcode reference library for T. erytreae in order to use DNA barcoding as a rapid tool for accurate identification of the pest to aid phytosanitary measures. Triozid samples were collected from citrus orchards in Kenya, Tanzania, and South Africa and from alternative host plants. Sequences generated from populations in the study showed very low variability within acceptable ranges of species. All samples analyzed were linked to T. erytreae of GenBank accession number KU517195. Phylogeny of samples in this study and other Trioza reference species was inferred using the Maximum Likelihood method. The phylogenetic tree was paraphyletic with two distinct branches. The first branch had two clusters: 1) cluster of all populations analyzed with GenBank accession of T. erytreae and 2) cluster of all the other GenBank accession of Trioza species analyzed except T. incrustata Percy, 2016 (KT588307.1), T. eugeniae Froggatt (KY294637.1), and T. grallata Percy, 2016 (KT588308.1) that occupied the second branch as outgroups forming sister clade relationships. These results were further substantiated with genetic distance values and principal component analyses. © The Author(s) 2017. Published by Oxford University Press on behalf of Entomological Society of America.

  20. 8 Allergenic Composition of Polymerized Allergen Extracts of Betula verrucosa, Dermatophagoides Pteronyssinus and Phleum Pratense

    PubMed Central

    Fernandez-Caldas, Enrique; Cases, Barbara; Tudela, Jose Ignacio; Fernandez, Eva Abel; Casanovas, Miguel; Subiza, Jose Luis

    2012-01-01

    Background Allergoids have been successfully used in the treatment of respiratory allergic diseases. They are modified allergen extracts that allow the administration of high allergen doses, due to their reduced IgE binding capacity.They maintain allergen-specific T-cell recognition. Since they are native allergen extracts that have been polymerized with glutaraldehyde, identification of the allergenic molecules requires more complicated methods. The aim of the study was to determine the qualitative composition of different polymerized extracts and investigate the presence of defined allergenic molecules using Mass spectrometry. Methods Proteomic analysis was carried out at the Proteomics Facility of the Hospital Nacional de Parapléjicos (Toledo, Spain). After reduction and alkylation, proteins were digested with trypsin and the resulting peptides were cleaned using C18 SpinTips Sample Prep Kit; peptides were separated on an Ultimate nano-LC system using a Monolithic C18 column in combination with a precolumn for salt removal. Fractionation of the peptides was performed with a Probot microfraction collector and MS and MS/MS analysis of offline spotted peptide samples were performed using the Applied Biosystems 4800 plus MALDI TOF/TOF Analyzer mass spectrometer. ProteinPilot Software V 2.0.1 and the Paragon algorithm were used for the identification of the proteins. Each MS/MS spectrum was searched against the SwissProt 2010_10 database, Uniprot-Viridiplantae database and Uniprot_Betula database. Results Analysis of the peptides revealed the presence of native allergens in the polymerized extracts: Der p 1, Der p 2, Der p 3, Der p 8 and Der p 11 in D. pteronyssinus; Bet v 2, Bet v 6, Bet v 7 and several Bet v 1 isoforms in B. verrucosa and Phl p 1, Phl p 3, Phl p 5, Phl p 11 and Phl p 12 in P. pratense allergoids. In all cases, potential allergenic proteins were also identified, including ubiquitin, actin, Eenolase, fructose-bisphosphate aldolase, luminal-binding protein (Heat shock protein 70), calmodulin, among others. Conclusions The characterization of the allergenic composition of allergoids is possible using MS/MS analysis. The analysis confirms the presence of native allergens in the allergoids. Mayor allergens are preserved during polymerization.

  1. Bacillus cereus-type polyhydroxyalkanoate biosynthetic gene cluster contains R-specific enoyl-CoA hydratase gene.

    PubMed

    Kihara, Takahiro; Hiroe, Ayaka; Ishii-Hyakutake, Manami; Mizuno, Kouhei; Tsuge, Takeharu

    2017-08-01

    Bacillus cereus and Bacillus megaterium both accumulate polyhydroxyalkanoate (PHA) but their PHA biosynthetic gene (pha) clusters that code for proteins involved in PHA biosynthesis are different. Namely, a gene encoding MaoC-like protein exists in the B. cereus-type pha cluster but not in the B. megaterium-type pha cluster. MaoC-like protein has an R-specific enoyl-CoA hydratase (R-hydratase) activity and is referred to as PhaJ when involved in PHA metabolism. In this study, the pha cluster of B. cereus YB-4 was characterized in terms of PhaJ's function. In an in vitro assay, PhaJ from B. cereus YB-4 (PhaJ YB4 ) exhibited hydration activity toward crotonyl-CoA. In an in vivo assay using Escherichia coli as a host for PHA accumulation, the recombinant strain expressing PhaJ YB4 and PHA synthase led to increased PHA accumulation, suggesting that PhaJ YB4 functioned as a monomer supplier. The monomer composition of the accumulated PHA reflected the substrate specificity of PhaJ YB4 , which appeared to prefer short chain-length substrates. The pha cluster from B. cereus YB-4 functioned to accumulate PHA in E. coli; however, it did not function when the phaJ YB4 gene was deleted. The B. cereus-type pha cluster represents a new example of a pha cluster that contains the gene encoding PhaJ.

  2. Automated flow cytometric analysis across large numbers of samples and cell types.

    PubMed

    Chen, Xiaoyi; Hasan, Milena; Libri, Valentina; Urrutia, Alejandra; Beitz, Benoît; Rouilly, Vincent; Duffy, Darragh; Patin, Étienne; Chalmond, Bernard; Rogge, Lars; Quintana-Murci, Lluis; Albert, Matthew L; Schwikowski, Benno

    2015-04-01

    Multi-parametric flow cytometry is a key technology for characterization of immune cell phenotypes. However, robust high-dimensional post-analytic strategies for automated data analysis in large numbers of donors are still lacking. Here, we report a computational pipeline, called FlowGM, which minimizes operator input, is insensitive to compensation settings, and can be adapted to different analytic panels. A Gaussian Mixture Model (GMM)-based approach was utilized for initial clustering, with the number of clusters determined using Bayesian Information Criterion. Meta-clustering in a reference donor permitted automated identification of 24 cell types across four panels. Cluster labels were integrated into FCS files, thus permitting comparisons to manual gating. Cell numbers and coefficient of variation (CV) were similar between FlowGM and conventional gating for lymphocyte populations, but notably FlowGM provided improved discrimination of "hard-to-gate" monocyte and dendritic cell (DC) subsets. FlowGM thus provides rapid high-dimensional analysis of cell phenotypes and is amenable to cohort studies. Copyright © 2015. Published by Elsevier Inc.

  3. Non-suicidal self-injury in high school students: Associations with identity processes and statuses.

    PubMed

    Luyckx, Koen; Gandhi, Amarendra; Bijttebier, Patricia; Claes, Laurence

    2015-06-01

    Non-suicidal self-injury (NSSI) refers to the direct, deliberate destruction of one's body tissue without suicidal intent. Research has highlighted the importance of identity synthesis versus confusion for NSSI. However, the association with identity processes and statuses remains unknown. A total of 568 adolescents reported on NSSI, identity, anxiety, and depression. Although identity processes of identification with commitment (negatively) and ruminative exploration (positively) were related to NSSI variables, these relationships were no longer significant when controlling for anxiety and depression. When examining identity statuses (using cluster analysis), individuals who had engaged in NSSI in the past (but not currently) were more likely to be in the moratorium cluster and less likely to be in the achievement cluster. Individuals who were currently engaging in NSSI were more likely to be in the troubled diffusion cluster. Clinicians should be attentive to the complex interplay between identity and NSSI when treating adolescents. Copyright © 2015 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  4. Preliminary analysis of one year long space climate simulation

    NASA Astrophysics Data System (ADS)

    Facsko, G.; Honkonen, I. J.; Juusola, L.; Viljanen, A.; Vanhamäki, H.; Janhunen, P.; Palmroth, M.; Milan, S. E.

    2013-12-01

    One full year (155 Cluster orbits, from January 29, 2002 to February 2, 2003) is simulated using the Grand Unified Magnetosphere Ionosphere Coupling simulation (GUMICS) in the European Cluster Assimilation Technology project (ECLAT). This enables us to study the performance of a global magnetospheric model in an unprecedented scale both in terms of the amount of available observations and the length of the timeseries that can be compared. The solar wind for the simulated period, obtained from OMNIWeb, is used as input to GUMICS. We present an overview of various comparisons of GUMICS results to observations for the simulated year. Results along the Cluster reference spacecraft orbit to are compared to Cluster measurements. The Cross Polar Cap Potential (CPCP) results are compared to SuperDARN measurements. The IMAGE electrojet indicators (IU, IL) calculated from the ionospheric currents of GUMICS are compared to observations. Finally, Geomagnetically Induced Currents (GIC) calculated from GUMICS results along the Finnish mineral gas pipeline at Mätsälä are also compared to measurements.

  5. Interaction force in a vertical dust chain inside a glass box.

    PubMed

    Kong, Jie; Qiao, Ke; Matthews, Lorin S; Hyde, Truell W

    2014-07-01

    Small number dust particle clusters can be used as probes for plasma diagnostics. The number of dust particles as well as cluster size and shape can be easily controlled employing a glass box placed within a Gaseous Electronics Conference (GEC) rf reference chamber to provide confinement of the dust. The plasma parameters inside this box and within the larger plasma chamber have not yet been adequately defined. Adjusting the rf power alters the plasma conditions causing structural changes of the cluster. This effect can be used to probe the relationship between the rf power and other plasma parameters. This experiment employs the sloshing and breathing modes of small cluster oscillations to examine the relationship between system rf power and the particle charge and plasma screening length inside the glass box. The experimental results provided indicate that both the screening length and dust charge decrease as rf power inside the box increases. The decrease in dust charge as power increases may indicate that ion trapping plays a significant role in the sheath.

  6. Two Suspected Worksite or Occupational Cancer Clusters Investigated Using the Cancer Data Registry and Multiple Primary Standardized Incidence Ratios in SEER *Stat-Idaho, 2013-2014.

    PubMed

    Rosenthal, Mariana; Johnson, Christopher J; Scoppa, Steve; Carter, Kris

    2016-01-01

    Investigations of suspected cancer clusters are resource intensive and rarely identify true clusters: among 428 publicly reported US investigations during 1990-2011, only 1 etiologic cluster was identified. In 2013, the Cancer Data Registry of Idaho (CDRI) was contacted regarding a suspected cancer cluster at a worksite (Cluster A) and among an occupational cohort (Cluster B). We investigated to determine whether these were true clusters. We derived investigation cohorts for Cluster A from facility-provided employee records and for Cluster B from professional licensing records. We used Registry PlusTM Link Plus to conduct probabilistic linkage of cohort members to the CDRI registry and completed matching through manual review by using LexisNexis®, Accurint®, and the Social Security Death Index. We calculated standardized incidence ratios (SIR) using the MP-SIR session type in SEER*Stat and Idaho and US referent populations. For Cluster A, we identified 34 cancer cases during 9,689 person-years; compared with Idaho and US rates, 95 percent CIs for SIRs included 1.0 for 24 of 24 primary site categories. For Cluster B, we identified 78 cancer cases during 15,154 person-years; compared with Idaho rates, 95 percent CI for SIRs included 1.0 for 23 of 24 primary site categories and was less than 1.0 for lung and bronchus cancers, and compared with US rates, 95 percent CI for SIRs included 1.0 for 22 of 24 primary site categories and was less than 1.0 for lung and bronchus and colorectal cancers. We identified no statistically significant excess in cancer incidence in either cohort. SEER*Stat's MP-SIR is an efficient tool for performing SIR assessments, a Centers for Disease Control and Prevention/Council of State and Territorial Epidemiologists-recommended step when investigating suspected cancer clusters.

  7. The Hubble Space Telescope UV Legacy Survey of Galactic Globular Clusters. XV. The Dynamical Clock: Reading Cluster Dynamical Evolution from the Segregation Level of Blue Straggler Stars

    NASA Astrophysics Data System (ADS)

    Ferraro, F. R.; Lanzoni, B.; Raso, S.; Nardiello, D.; Dalessandro, E.; Vesperini, E.; Piotto, G.; Pallanca, C.; Beccari, G.; Bellini, A.; Libralato, M.; Anderson, J.; Aparicio, A.; Bedin, L. R.; Cassisi, S.; Milone, A. P.; Ortolani, S.; Renzini, A.; Salaris, M.; van der Marel, R. P.

    2018-06-01

    The parameter A +, defined as the area enclosed between the cumulative radial distribution of blue straggler stars (BSSs) and that of a reference population, is a powerful indicator of the level of BSS central segregation. As part of the Hubble Space Telescope UV Legacy Survey of Galactic globular clusters (GCs), here we present the BSS population and the determination of A + in 27 GCs observed out to about one half-mass radius. In combination with 21 additional clusters discussed in a previous paper, this provides us with a global sample of 48 systems (corresponding to ∼32% of the Milky Way GC population), for which we find a strong correlation between A + and the ratio of cluster age to the current central relaxation time. Tight relations have also been found with the core radius and the central luminosity density, which are expected to change with the long-term cluster dynamical evolution. An interesting relation is emerging between A + and the ratio of the BSS velocity dispersion relative to that of main sequence turn-off stars, which measures the degree of energy equipartition experienced by BSSs in the cluster. These results provide further confirmation that BSSs are invaluable probes of GC internal dynamics and that A + is a powerful dynamical clock.

  8. Classification of frailty using the Kihon checklist: A cluster analysis of older adults in urban areas.

    PubMed

    Kera, Takeshi; Kawai, Hisashi; Yoshida, Hideyo; Hirano, Hirohiko; Kojima, Motonaga; Fujiwara, Yoshinori; Ihara, Kazushige; Obuchi, Shuichi

    2017-01-01

    Frailty is an important predictor of the need for long-term care and hospitalization. Our aim was to categorize frailty in community-dwelling older adults. The present study was carried out in 2011-2013, and consisted of 1380 individuals over 65 years of age. Participants completed the Kihon checklist, which is widely used to assess frailty in Japan, and their physical, cognitive and social function was evaluated. Non-hierarchical cluster analysis was used to statistically categorize frailty. The optimum number of clusters was determined as the point at which the external reference values (instrumental activity of daily living score, grip power, 10-m walk time, body mass index, portable fall risk index, occlusal force and Mini-Mental State Examination score) differed. According to the Kihon checklist, 369 (26.7%) of the 1380 study participants were considered frail. When the cluster number was increased from two to six, the scores in each subdomain of the Kihon checklist significantly differed. The estimated minimum number of clusters was five, and each of the five cluster groups had distinct characteristics. The numbers of participants in cluster groups 1-5 were 105, 78, 62, 71 and 53, respectively. We identified five types of frailty in community-dwelling older adults in Japan: "experience of falling," "pre-frailty," "oral frailty," "housebound" and "severe frailty." Geriatr Gerontol Int 2017; 17: 69-77. © 2016 Japan Geriatrics Society.

  9. Biosynthetic Genes for the Tetrodecamycin Antibiotics

    PubMed Central

    Gverzdys, Tomas

    2016-01-01

    ABSTRACT We recently described 13-deoxytetrodecamycin, a new member of the tetrodecamycin family of antibiotics. A defining feature of these molecules is the presence of a five-membered lactone called a tetronate ring. By sequencing the genome of a producer strain, Streptomyces sp. strain WAC04657, and searching for a gene previously implicated in tetronate ring formation, we identified the biosynthetic genes responsible for producing 13-deoxytetrodecamycin (the ted genes). Using the ted cluster in WAC04657 as a reference, we found related clusters in three other organisms: Streptomyces atroolivaceus ATCC 19725, Streptomyces globisporus NRRL B-2293, and Streptomyces sp. strain LaPpAH-202. Comparing the four clusters allowed us to identify the cluster boundaries. Genetic manipulation of the cluster confirmed the involvement of the ted genes in 13-deoxytetrodecamycin biosynthesis and revealed several additional molecules produced through the ted biosynthetic pathway, including tetrodecamycin, dihydrotetrodecamycin, and another, W5.9, a novel molecule. Comparison of the bioactivities of these four molecules suggests that they may act through the covalent modification of their target(s). IMPORTANCE The tetrodecamycins are a distinct subgroup of the tetronate family of secondary metabolites. Little is known about their biosynthesis or mechanisms of action, making them an attractive subject for investigation. In this paper we present the biosynthetic gene cluster for 13-deoxytetrodecamycin in Streptomyces sp. strain WAC04657. We identify related clusters in several other organisms and show that they produce related molecules. PMID:27137499

  10. The Gemini/HST Galaxy Cluster Project: Redshift 0.2–1.0 Cluster Sample, X-Ray Data, and Optical Photometry Catalog

    NASA Astrophysics Data System (ADS)

    Jørgensen, Inger; Chiboucas, Kristin; Hibon, Pascale; Nielsen, Louise D.; Takamiya, Marianne

    2018-04-01

    The Gemini/HST Galaxy Cluster Project (GCP) covers 14 z = 0.2–1.0 clusters with X-ray luminosity of {L}500≥slant {10}44 {erg} {{{s}}}-1 in the 0.1–2.4 keV band. In this paper, we provide homogeneously calibrated X-ray luminosities, masses, and radii, and we present the complete catalog of the ground-based photometry for the GCP clusters. The clusters were observed with either Gemini North or South in three or four of the optical passbands g‧, r‧, i‧, and z‧. The photometric catalog includes consistently calibrated total magnitudes, colors, and geometrical parameters. The photometry reaches ≈25 mag in the passband closest to the rest-frame B band. We summarize comparisons of our photometry with data from the Sloan Digital Sky Survey. We describe the sample selection for our spectroscopic observations, and establish the calibrations to obtain rest-frame magnitudes and colors. Finally, we derive the color–magnitude relations for the clusters, and briefly discuss these in the context of evolution with redshift. Consistent with our results based on spectroscopic data, the color–magnitude relations support passive evolution of the red sequence galaxies. The absence of change in the slope with redshift constrains the allowable age variation along the red sequence to <0.05 dex between the brightest cluster galaxies and those four magnitudes fainter. This paper serves as the main reference for the GCP cluster and galaxy selection, X-ray data, and ground-based photometry.

  11. Comparative Genomic Hybridization Analysis of Two Predominant Nordic Group I (Proteolytic) Clostridium botulinum Type B Clusters▿ †

    PubMed Central

    Lindström, Miia; Hinderink, Katja; Somervuo, Panu; Kiviniemi, Katri; Nevas, Mari; Chen, Ying; Auvinen, Petri; Carter, Andrew T.; Mason, David R.; Peck, Michael W.; Korkeala, Hannu

    2009-01-01

    Comparative genomic hybridization analysis of 32 Nordic group I Clostridium botulinum type B strains isolated from various sources revealed two homogeneous clusters, clusters BI and BII. The type B strains differed from reference strain ATCC 3502 by 413 coding sequence (CDS) probes, sharing 88% of all the ATCC 3502 genes represented on the microarray. The two Nordic type B clusters differed from each other by their response to 145 CDS probes related mainly to transport and binding, adaptive mechanisms, fatty acid biosynthesis, the cell membranes, bacteriophages, and transposon-related elements. The most prominent differences between the two clusters were related to resistance to toxic compounds frequently found in the environment, such as arsenic and cadmium, reflecting different adaptive responses in the evolution of the two clusters. Other relatively variable CDS groups were related to surface structures and the gram-positive cell wall, suggesting that the two clusters possess different antigenic properties. All the type B strains carried CDSs putatively related to capsule formation, which may play a role in adaptation to different environmental and clinical niches. Sequencing showed that representative strains of the two type B clusters both carried subtype B2 neurotoxin genes. As many of the type B strains studied have been isolated from foods or associated with botulism, it is expected that the two group I C. botulinum type B clusters present a public health hazard in Nordic countries. Knowing the genetic and physiological markers of these clusters will assist in targeting control measures against these pathogens. PMID:19270141

  12. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.

  13. From Luminous Hot Stars to Starburst Galaxies

    NASA Astrophysics Data System (ADS)

    Conti, Peter S.; Crowther, Paul A.; Leitherer, Claus

    2012-10-01

    1. Introduction; 2. Observed properties; 3. Stellar atmospheres; 4. Stellar winds; 5. Evolution of single stars; 6. Binaries; 7. Birth of massive stars and star clusters; 8. The interstellar environment; 9. From giant HII regions to HII galaxies; 10. Starburst phenomena; 11. Cosmological implications; References; Index.

  14. Vicious Circles in Organizations.

    ERIC Educational Resources Information Center

    Masuch, Michael

    1985-01-01

    After examining some elementary notions of action theory and cybernetics, this article analyzes the dynamics, clustering, and survival chances of vicious circles. It argues that the action perspective implies that many structural suboptimalities of organizations are caused by vicious circles. Eleven figures and 105 references are provided. (DCS)

  15. Workplace Learning: A Concept in Off-Campus Teaching.

    ERIC Educational Resources Information Center

    Rose, Emma; McKee, Willie; Temple, Bryan K.; Harrison, David K.; Kirkwood, D.

    2001-01-01

    Discusses types of university-provided workplace learning; identifies problems posed by employee turnover and lack of equipment. Suggests that the problem of too few students to have a cost-effective program can be solved by clustering program offerings for small businesses. (Contains 25 references.) (SK)

  16. A diagnostic for determining the quality of single-reference electron correlation methods

    NASA Technical Reports Server (NTRS)

    Lee, Timothy J.; Taylor, Peter R.

    1989-01-01

    It was recently proposed that the Euclidian norm of the t(sub 1) vector of the coupled cluster wave function (normalized by the number of electrons included in the correlation procedure) could be used to determine whether a single-reference-based electron correlation procedure is appopriate. This diagnostic, T(sub 1) is defined for use with self-consistent-field molecular orbitals and is invariant to the same orbital rotations as the coupled cluster energy. T(sub 1) is investigated for several different chemical systems which exhibit a range of multireference behavior, and is shown to be an excellent measure of the importance of non-dynamical electron correlation and is far superior to C(sub 0) from a singles and doubles configuration interaction wave function. It is further suggested that when the aim is to recover a large fraction of the dynamical electron correlation energy, a large T(sub 1) (i.e., greater than 0.02) probably indicates the need for a multireference electron correlation procedure.

  17. A diagnostic for determining the quality of single-reference electron correlation methods

    NASA Technical Reports Server (NTRS)

    Lee, Timothy J.; Taylor, Peter R.

    1989-01-01

    It was recently proposed that the Euclidian norm of the t sub 1 vector of the coupled cluster wave function (normalized by the number of electrons included in the correlation procedure) could be used to determine whether a single-reference-based electron correlation procedure is appropriate. This diagnostic, T sub 1, is defined for use with self consistent field molecular orbitals and is invariant to the same orbital rotations as the coupled cluster energy. T sub 1 is investigated for several different chemical systems which exhibit a range of multireference behavior, and is shown to be an excellent measure of the importance of nondynamical electron correlation and is far superior to C sub 0 from a singles and doubles configuration interaction wave function. It is further suggested that when the aim is to recover a large fraction of the dynamical electron correlation energy, a large T sub 1 (i.e., greater than 0.02) probably indicates the need for a multireference electron correlation procedure.

  18. Identification of individual coherent sets associated with flow trajectories using coherent structure coloring

    NASA Astrophysics Data System (ADS)

    Schlueter-Kuck, Kristy L.; Dabiri, John O.

    2017-09-01

    We present a method for identifying the coherent structures associated with individual Lagrangian flow trajectories even where only sparse particle trajectory data are available. The method, based on techniques in spectral graph theory, uses the Coherent Structure Coloring vector and associated eigenvectors to analyze the distance in higher-dimensional eigenspace between a selected reference trajectory and other tracer trajectories in the flow. By analyzing this distance metric in a hierarchical clustering, the coherent structure of which the reference particle is a member can be identified. This algorithm is proven successful in identifying coherent structures of varying complexities in canonical unsteady flows. Additionally, the method is able to assess the relative coherence of the associated structure in comparison to the surrounding flow. Although the method is demonstrated here in the context of fluid flow kinematics, the generality of the approach allows for its potential application to other unsupervised clustering problems in dynamical systems such as neuronal activity, gene expression, or social networks.

  19. Integration of Genomic and Other Epidemiologic Data to Investigate and Control a Cross-Institutional Outbreak of Streptococcus pyogenes.

    PubMed

    Chalker, Victoria J; Smith, Alyson; Al-Shahib, Ali; Botchway, Stella; Macdonald, Emily; Daniel, Roger; Phillips, Sarah; Platt, Steven; Doumith, Michel; Tewolde, Rediat; Coelho, Juliana; Jolley, Keith A; Underwood, Anthony; McCarthy, Noel D

    2016-06-01

    Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice.

  20. Multipoint observations of plasma phenomena made in space by Cluster

    NASA Astrophysics Data System (ADS)

    Goldstein, M. L.; Escoubet, P.; Hwang, K.-Joo; Wendel, D. E.; Viñas, A.-F.; Fung, S. F.; Perri, S.; Servidio, S.; Pickett, J. S.; Parks, G. K.; Sahraoui, F.; Gurgiolo, C.; Matthaeus, W.; Weygand, J. M.

    2015-06-01

    Plasmas are ubiquitous in nature, surround our local geospace environment, and permeate the universe. Plasma phenomena in space give rise to energetic particles, the aurora, solar flares and coronal mass ejections, as well as many energetic phenomena in interstellar space. Although plasmas can be studied in laboratory settings, it is often difficult, if not impossible, to replicate the conditions (density, temperature, magnetic and electric fields, etc.) of space. Single-point space missions too numerous to list have described many properties of near-Earth and heliospheric plasmas as measured both in situ and remotely (see http://www.nasa.gov/missions/#.U1mcVmeweRY for a list of NASA-related missions). However, a full description of our plasma environment requires three-dimensional spatial measurements. Cluster is the first, and until data begin flowing from the Magnetospheric Multiscale Mission (MMS), the only mission designed to describe the three-dimensional spatial structure of plasma phenomena in geospace. In this paper, we concentrate on some of the many plasma phenomena that have been studied using data from Cluster. To date, there have been more than 2000 refereed papers published using Cluster data but in this paper we will, of necessity, refer to only a small fraction of the published work. We have focused on a few basic plasma phenomena, but, for example, have not dealt with most of the vast body of work describing dynamical phenomena in Earth's magnetosphere, including the dynamics of current sheets in Earth's magnetotail and the morphology of the dayside high latitude cusp. Several review articles and special publications are available that describe aspects of that research in detail and interested readers are referred to them (see for example, Escoubet et al. 2005 Multiscale Coupling of Sun-Earth Processes, p. 459, Keith et al. 2005 Sur. Geophys. 26, 307-339, Paschmann et al. 2005 Outer Magnetospheric Boundaries: Cluster Results, Space Sciences Series of ISSI. Berlin: Springer, Goldstein et al. 2006 Adv. Space Res. 38, 21-36, Taylor et al. 2010 The Cluster Mission: Space Plasma in Three Dimensions, Springer, pp. 309-330 and Escoubet et al. 2013 Ann. Geophys. 31, 1045-1059).

  1. On the cooperativity of association and reference energy scales in thermodynamic perturbation theory

    NASA Astrophysics Data System (ADS)

    Marshall, Bennett D.

    2016-11-01

    Equations of state for hydrogen bonding fluids are typically described by two energy scales. A short range highly directional hydrogen bonding energy scale as well as a reference energy scale which accounts for dispersion and orientationally averaged multi-pole attractions. These energy scales are always treated independently. In recent years, extensive first principles quantum mechanics calculations on small water clusters have shown that both hydrogen bond and reference energy scales depend on the number of incident hydrogen bonds of the water molecule. In this work, we propose a new methodology to couple the reference energy scale to the degree of hydrogen bonding in the fluid. We demonstrate the utility of the new approach by showing that it gives improved predictions of water-hydrocarbon mutual solubilities.

  2. Globular Clusters: Absolute Proper Motions and Galactic Orbits

    NASA Astrophysics Data System (ADS)

    Chemel, A. A.; Glushkova, E. V.; Dambis, A. K.; Rastorguev, A. S.; Yalyalieva, L. N.; Klinichev, A. D.

    2018-04-01

    We cross-match objects from several different astronomical catalogs to determine the absolute proper motions of stars within the 30-arcmin radius fields of 115 Milky-Way globular clusters with the accuracy of 1-2 mas yr-1. The proper motions are based on positional data recovered from the USNO-B1, 2MASS, URAT1, ALLWISE, UCAC5, and Gaia DR1 surveys with up to ten positions spanning an epoch difference of up to about 65 years, and reduced to Gaia DR1 TGAS frame using UCAC5 as the reference catalog. Cluster members are photometrically identified by selecting horizontal- and red-giant branch stars on color-magnitude diagrams, and the mean absolute proper motions of the clusters with a typical formal error of about 0.4 mas yr-1 are computed by averaging the proper motions of selected members. The inferred absolute proper motions of clusters are combined with available radial-velocity data and heliocentric distance estimates to compute the cluster orbits in terms of the Galactic potential models based on Miyamoto and Nagai disk, Hernquist spheroid, and modified isothermal dark-matter halo (axisymmetric model without a bar) and the same model + rotating Ferre's bar (non-axisymmetric). Five distant clusters have higher-than-escape velocities, most likely due to large errors of computed transversal velocities, whereas the computed orbits of all other clusters remain bound to the Galaxy. Unlike previously published results, we find the bar to affect substantially the orbits of most of the clusters, even those at large Galactocentric distances, bringing appreciable chaotization, especially in the portions of the orbits close to the Galactic center, and stretching out the orbits of some of the thick-disk clusters.

  3. [Bibliometrics and visualization analysis of land use regression models in ambient air pollution research].

    PubMed

    Zhang, Y J; Zhou, D H; Bai, Z P; Xue, F X

    2018-02-10

    Objective: To quantitatively analyze the current status and development trends regarding the land use regression (LUR) models on ambient air pollution studies. Methods: Relevant literature from the PubMed database before June 30, 2017 was analyzed, using the Bibliographic Items Co-occurrence Matrix Builder (BICOMB 2.0). Keywords co-occurrence networks, cluster mapping and timeline mapping were generated, using the CiteSpace 5.1.R5 software. Relevant literature identified in three Chinese databases was also reviewed. Results: Four hundred sixty four relevant papers were retrieved from the PubMed database. The number of papers published showed an annual increase, in line with the growing trend of the index. Most papers were published in the journal of Environmental Health Perspectives . Results from the Co-word cluster analysis identified five clusters: cluster#0 consisted of birth cohort studies related to the health effects of prenatal exposure to air pollution; cluster#1 referred to land use regression modeling and exposure assessment; cluster#2 was related to the epidemiology on traffic exposure; cluster#3 dealt with the exposure to ultrafine particles and related health effects; cluster#4 described the exposure to black carbon and related health effects. Data from Timeline mapping indicated that cluster#0 and#1 were the main research areas while cluster#3 and#4 were the up-coming hot areas of research. Ninety four relevant papers were retrieved from the Chinese databases with most of them related to studies on modeling. Conclusion: In order to better assess the health-related risks of ambient air pollution, and to best inform preventative public health intervention policies, application of LUR models to environmental epidemiology studies in China should be encouraged.

  4. Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features.

    PubMed

    Haakensen, Vilde D; Lingjaerde, Ole Christian; Lüders, Torben; Riis, Margit; Prat, Aleix; Troester, Melissa A; Holmen, Marit M; Frantzen, Jan Ole; Romundstad, Linda; Navjord, Dina; Bukholm, Ida K; Johannesen, Tom B; Perou, Charles M; Ursin, Giske; Kristensen, Vessela N; Børresen-Dale, Anne-Lise; Helland, Aslaug

    2011-11-01

    Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer.

  5. Use of Conserved Randomly Amplified Polymorphic DNA (RAPD) Fragments and RAPD Pattern for Characterization of Lactobacillus fermentum in Ghanaian Fermented Maize Dough

    PubMed Central

    Hayford, Alice E.; Petersen, Anne; Vogensen, Finn K.; Jakobsen, Mogens

    1999-01-01

    The present work describes the use of randomly amplified polymorphic DNA (RAPD) for the characterization of 172 dominant Lactobacillus isolates from present and previous studies of Ghanaian maize fermentation. Heterofermentative lactobacilli dominate the fermentation flora, since approximately 85% of the isolates belong to this group. Cluster analysis of the RAPD profiles obtained showed the presence of two main clusters. Cluster 1 included Lactobacillus fermentum, whereas cluster 2 comprised the remaining Lactobacillus spp. The two distinct clusters emerged at the similarity level of <50%. All isolates in cluster 1 showed similarity in their RAPD profile to the reference strains of L. fermentum included in the study. These isolates, yielding two distinct bands of approximately 695 and 773 bp with the primers used, were divided into four subclusters, indicating that several strains are involved in the fermentation and remain dominant throughout the process. The two distinct RAPD fragments were cloned, sequenced, and used as probes in Southern hybridization experiments. With one exception, Lactobacillus reuteri LMG 13045, the probes hybridized only to fragments of different sizes in EcoRI-digested chromosomal DNA of L. fermentum strains, thus indicating the specificity of the probes and variation within the L. fermentum isolates. PMID:10388723

  6. Typology of schizotypy in non-clinical young adults: Psychopathological and personality disorder traits correlates.

    PubMed

    Raynal, Patrick; Goutaudier, Nelly; Nidetch, Victoria; Chabrol, Henri

    2016-12-30

    Few typological studies address schizotypy in young adults. Schizotypal traits were assessed on 466 college students using the Schizotypal Personality Questionnaire-Brief (SPQ-B). Other measures evaluated personality traits previously associated with schizotypy (borderline, obsessionnal, and autistic traits), psychopathological symptoms (suicidal ideations, depressive and obsessive-compulsive symptoms) and psychosocial functioning. A factor analysis was first performed on SPQ-B results, leading to four factors: negative schizotypy, positive schizotypy, social anxiety, and reference ideas. Based on these factors, a cluster analysis was conducted, which yielded four clearly distinct groups characterized by "Low" (non schizotypy), "High schizotypy" (mixed positive and negative), "Positive schizotypy", and "Social impairment". Regarding personality disorder traits and psychopathological symptoms, the "High schizotypy" cluster scored higher than the "Positive" and the "Social impairment" groups, which scored higher than the "Low" cluster. The "Positive" group had higher levels of interpersonal relationships than in the "High" and the "Social impairment" clusters, suggesting that positive schizotypy was associated to benefits such as perceived social relationships. Nevertheless the "Positive" cluster was also linked to high levels of personality disorder traits and psychopathological symptoms, and to low academic achievement, at levels similar those observed in the "Social impairment" cluster, confirming an unhealthy side to positive schizotypy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. Comparative Investigation of Shared Filesystems for the LHCb Online Cluster

    NASA Astrophysics Data System (ADS)

    Vijay Kartik, S.; Neufeld, Niko

    2012-12-01

    This paper describes the investigative study undertaken to evaluate shared filesystem performance and suitability in the LHCb Online environment. Particular focus is given to the measurements and field tests designed and performed on an in-house OpenAFS setup; related comparisons with NFSv4 and GPFS (a clustered filesystem from IBM) are presented. The motivation for the investigation and the test setup arises from the need to serve common user-space like home directories, experiment software and control areas, and clustered log areas. Since the operational requirements on such user-space are stringent in terms of read-write operations (in frequency and access speed) and unobtrusive data relocation, test results are presented with emphasis on file-level performance, stability and “high-availability” of the shared filesystems. Use cases specific to the experiment operation in LHCb, including the specific handling of shared filesystems served to a cluster of 1500 diskless nodes, are described. Issues of prematurely expiring authenticated sessions are explicitly addressed, keeping in mind long-running analysis jobs on the Online cluster. In addition, quantitative test results are also presented with alternatives including NFSv4. Comparative measurements of filesystem performance benchmarks are presented, which are seen to be used as reference for decisions on potential migration of the current storage solution deployed in the LHCb online cluster.

  8. Feedback in the Antennae Galaxies (NGC 4038/9): I. High-Resolution Infrared Spectroscopy of Winds from Super Star Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, A; Graham, J

    2007-06-05

    We present high-resolution (R {approx} 24,600) near-IR spectroscopy of the youngest super star clusters (SSCs) in the prototypical starburst merger, the Antennae Galaxies. These SSCs are young (3-7 Myr old) and massive (10{sup 5}-10{sup 7} M{sub {circle_dot}} for a Kroupa IMF) and their spectra are characterized by broad, extended Brackett {gamma} emission, so we refer to them as emission-line clusters (ELCs) to distinguish them from older SSCs. The Br {gamma} lines of most ELCs have supersonic widths (60-110 km s{sup -1} FWHM) and non-Gaussian wings whose velocities exceed the clusters escape velocities. This high-velocity unbound gas is flowing out inmore » winds that are powered by the clusters massive O and W-R stars over the course of at least several crossing times. The large sizes of some ELCs relative to those of older SSCs may be due to expansion caused by these outflows; many of the ELCs may not survive as bound stellar systems, but rather dissipate rapidly into the field population. The observed tendency of older ELCs to be more compact than young ones is consistent with the preferential survival of the most concentrated clusters at a given age.« less

  9. VizieR Online Data Catalog: HST Frontier Fields Herschel sources (Rawle+, 2016)

    NASA Astrophysics Data System (ADS)

    Rawle, T. D.; Altieri, B.; Egami, E.; Perez-Gonzalez, P. G.; Boone, F.; Clement, B.; Ivison, R. J.; Richard, J.; Rujopakarn, W.; Valtchanov, I.; Walth, G.; Weiner, B. J.; Blain, A. W.; Dessauges-Zavadsky, M.; Kneib, J.-P.; Lutz, D.; Rodighiero, G.; Schaerer, D.; Smail, I.

    2017-07-01

    We present a complete census of the 263 Herschel-detected sources within the HST Frontier Fields, including 163 lensed sources located behind the clusters. Our primary aim is to provide a robust legacy catalogue of the Herschel fluxes, which we combine with archival data from Spitzer and WISE to produce IR SEDs. We optimally combine the IR photometry with data from HST, VLA and ground-based observatories in order to identify optical counterparts and gain source redshifts. Each cluster is observed in two distinct regions, referred to as the central and parallel footprints. (2 data files).

  10. A magnetic survey of AP stars in young clusters - Preliminary results

    NASA Astrophysics Data System (ADS)

    Brown, D. N.; Landstreet, J. D.; Thompson, I.

    Photoelectric polarimetry of Ap stars was undertaken in order to investigate the role of magnetic fields in the evolution of atmospheric chemical peculiarities and the braking of stellar rotation. The stars are grouped by cluster or association and listed by HD number, and each star's spectral type, reference for classification, number of magnetic observations, and root mean square of the equivalent magnetic field measurements obtained from an expression are shown. The data obtained to date include several new magnetic identifications and display the character of the survey, but are not yet sufficient to support any firm evolutionary conclusions.

  11. Density-Functional Theory with Dispersion-Correcting Potentials for Methane: Bridging the Efficiency and Accuracy Gap between High-Level Wave Function and Classical Molecular Mechanics Methods.

    PubMed

    Torres, Edmanuel; DiLabio, Gino A

    2013-08-13

    Large clusters of noncovalently bonded molecules can only be efficiently modeled by classical mechanics simulations. One prominent challenge associated with this approach is obtaining force-field parameters that accurately describe noncovalent interactions. High-level correlated wave function methods, such as CCSD(T), are capable of correctly predicting noncovalent interactions, and are widely used to produce reference data. However, high-level correlated methods are generally too computationally costly to generate the critical reference data required for good force-field parameter development. In this work we present an approach to generate Lennard-Jones force-field parameters to accurately account for noncovalent interactions. We propose the use of a computational step that is intermediate to CCSD(T) and classical molecular mechanics, that can bridge the accuracy and computational efficiency gap between them, and demonstrate the efficacy of our approach with methane clusters. On the basis of CCSD(T)-level binding energy data for a small set of methane clusters, we develop methane-specific, atom-centered, dispersion-correcting potentials (DCPs) for use with the PBE0 density-functional and 6-31+G(d,p) basis sets. We then use the PBE0-DCP approach to compute a detailed map of the interaction forces associated with the removal of a single methane molecule from a cluster of eight methane molecules and use this map to optimize the Lennard-Jones parameters for methane. The quality of the binding energies obtained by the Lennard-Jones parameters we obtained is assessed on a set of methane clusters containing from 2 to 40 molecules. Our Lennard-Jones parameters, used in combination with the intramolecular parameters of the CHARMM force field, are found to closely reproduce the results of our dispersion-corrected density-functional calculations. The approach outlined can be used to develop Lennard-Jones parameters for any kind of molecular system.

  12. Decrease in musculoskeletal pain after 4 and 12 months of an aerobic exercise intervention: a worksite RCT among cleaners.

    PubMed

    Korshøj, Mette; Birk Jørgensen, Marie; Lidegaard, Mark; Mortensen, Ole Steen; Krustrup, Peter; Holtermann, Andreas; Søgaard, Karen

    2017-07-01

    Prevalence of musculoskeletal pain is high in jobs with high physical work demands. An aerobic exercise intervention targeting cardiovascular health was evaluated for its long term side effects on musculoskeletal pain. The objective was to investigate if aerobic exercise affects level of musculoskeletal pain from baseline to 4- and 12-months follow-up. One-hundred-and-sixteen cleaners aged 18-65 years were cluster-randomized. The aerobic exercise group ( n = 57) received worksite aerobic exercise (30 min twice a week) and the reference group ( n = 59) lectures in health promotion. Strata were formed according to closest manager (total 11 strata); clusters were set within strata (total 40 clusters, 20 in each group). Musculoskeletal pain data from eight body regions was collected at baseline and after 4- and 12-months follow-up. The participants stated highest pain in the last month on a scale from 0, stating no pain, up to 10, stating worst possible pain. A repeated-measure 2 × 2 multi-adjusted mixed-models design was applied to compare the between-groups differences in an intention to treat analysis. Participants were entered as a random effect nested in clusters to account for the cluster-based randomization. Clinically significant reductions (>30%, f  2 > 0.25) in the aerobic exercise group, compared to the reference group, in pain intensity in neck, shoulders, arms/wrists were found at 12-months follow-up, and a tendency ( p = 0.07, f  2 = 0.18) to an increase for the knees. At 4-months follow-up the only significant between-group change was an increase in hip pain. This study indicates that aerobic exercise reduces musculoskeletal pain in the upper extremities, but as an unintended side effect may increase pain in the lower extremities. Aerobic exercise interventions among workers standing or walking in the majority of the working hours should tailor exercise to only maintain the positive effect on musculoskeletal pain.

  13. Methylobacterium genome sequences: a reference blueprint to investigate microbial metabolism of C1 compounds from natural and industrial sources.

    PubMed

    Vuilleumier, Stéphane; Chistoserdova, Ludmila; Lee, Ming-Chun; Bringel, Françoise; Lajus, Aurélie; Zhou, Yang; Gourion, Benjamin; Barbe, Valérie; Chang, Jean; Cruveiller, Stéphane; Dossat, Carole; Gillett, Will; Gruffaz, Christelle; Haugen, Eric; Hourcade, Edith; Levy, Ruth; Mangenot, Sophie; Muller, Emilie; Nadalig, Thierry; Pagni, Marco; Penny, Christian; Peyraud, Rémi; Robinson, David G; Roche, David; Rouy, Zoé; Saenampechek, Channakhone; Salvignol, Grégory; Vallenet, David; Wu, Zaining; Marx, Christopher J; Vorholt, Julia A; Olson, Maynard V; Kaul, Rajinder; Weissenbach, Jean; Médigue, Claudine; Lidstrom, Mary E

    2009-01-01

    Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid). These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles.

  14. Cross-reference identification within a PDF document

    NASA Astrophysics Data System (ADS)

    Li, Sida; Gao, Liangcai; Tang, Zhi; Yu, Yinyan

    2015-01-01

    Cross-references, such like footnotes, endnotes, figure/table captions, references, are a common and useful type of page elements to further explain their corresponding entities in the target document. In this paper, we focus on cross-reference identification in a PDF document, and present a robust method as a case study of identifying footnotes and figure references. The proposed method first extracts footnotes and figure captions, and then matches them with their corresponding references within a document. A number of novel features within a PDF document, i.e., page layout, font information, lexical and linguistic features of cross-references, are utilized for the task. Clustering is adopted to handle the features that are stable in one document but varied in different kinds of documents so that the process of identification is adaptive with document types. In addition, this method leverages results from the matching process to provide feedback to the identification process and further improve the algorithm accuracy. The primary experiments in real document sets show that the proposed method is promising to identify cross-reference in a PDF document.

  15. Documentation for the machine-readable version of the revised Catalogue of Stellar Rotational Velocities of Uesugi and Fukuda (1982)

    NASA Technical Reports Server (NTRS)

    Warren, W. H., Jr.

    1983-01-01

    The machine-readable catalog provides mean data on the old Slettebak system for 6472 stars. The catalog results from the review, analysis and transformation of 11460 data from 102 sources. Star identification, (major catalog number, name if the star has one, or cluster identification, etc.), a man projected rotational velocity, and a list of source references re included. The references are given in a second file included with the catalog when it is distributed on magnetic tape. The contents and/formats of the the data and reference files of the machine-readable catalog are described to enable users to read and process the data.

  16. Handbook of Occupational Programs. Task Linkage Project Publication No. 1.

    ERIC Educational Resources Information Center

    Georgia State Univ., Atlanta. School of Education.

    To demonstrate the continuity between secondary and postsecondary occupational programs and the link between them and industrial manpower roles, this handbook cross references Georgia occupational educational programs and related job titles. Nineteen occupational clusters included in secondary schools are covered: agricultural power and mechanics;…

  17. Mixture Modeling: Applications in Educational Psychology

    ERIC Educational Resources Information Center

    Harring, Jeffrey R.; Hodis, Flaviu A.

    2016-01-01

    Model-based clustering methods, commonly referred to as finite mixture modeling, have been applied to a wide variety of cross-sectional and longitudinal data to account for heterogeneity in population characteristics. In this article, we elucidate 2 such approaches: growth mixture modeling and latent profile analysis. Both techniques are…

  18. Identifying Differences among Novice Database Users: Implications for Training Material Effectiveness.

    ERIC Educational Resources Information Center

    Antonucci, Yvonne Lederer; Wozny, Lucy Anne

    1996-01-01

    Identifies and describes sublevels of novices using a database management package, clustering those whose interaction is effective, partially effective, and totally ineffective. Among assistance documentation, functional tree diagrams (FTDs) were more beneficial to partially effective users than traditional reference material. The results have…

  19. Relating Measurement Invariance, Cross-Level Invariance, and Multilevel Reliability.

    PubMed

    Jak, Suzanne; Jorgensen, Terrence D

    2017-01-01

    Data often have a nested, multilevel structure, for example when data are collected from children in classrooms. This kind of data complicate the evaluation of reliability and measurement invariance, because several properties can be evaluated at both the individual level and the cluster level, as well as across levels. For example, cross-level invariance implies equal factor loadings across levels, which is needed to give latent variables at the two levels a similar interpretation. Reliability at a specific level refers to the ratio of true score variance over total variance at that level. This paper aims to shine light on the relation between reliability, cross-level invariance, and strong factorial invariance across clusters in multilevel data. Specifically, we will illustrate how strong factorial invariance across clusters implies cross-level invariance and perfect reliability at the between level in multilevel factor models.

  20. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    PubMed

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  1. Solvatochromic shifts from coupled-cluster theory embedded in density functional theory

    NASA Astrophysics Data System (ADS)

    Höfener, Sebastian; Gomes, André Severo Pereira; Visscher, Lucas

    2013-09-01

    Building on the framework recently reported for determining general response properties for frozen-density embedding [S. Höfener, A. S. P. Gomes, and L. Visscher, J. Chem. Phys. 136, 044104 (2012)], 10.1063/1.3675845, in this work we report a first implementation of an embedded coupled-cluster in density-functional theory (CC-in-DFT) scheme for electronic excitations, where only the response of the active subsystem is taken into account. The formalism is applied to the calculation of coupled-cluster excitation energies of water and uracil in aqueous solution. We find that the CC-in-DFT results are in good agreement with reference calculations and experimental results. The accuracy of calculations is mainly sensitive to factors influencing the correlation treatment (basis set quality, truncation of the cluster operator) and to the embedding treatment of the ground-state (choice of density functionals). This allows for efficient approximations at the excited state calculation step without compromising the accuracy. This approximate scheme makes it possible to use a first principles approach to investigate environment effects with specific interactions at coupled-cluster level of theory at a cost comparable to that of calculations of the individual subsystems in vacuum.

  2. Correlation between the resistivity and the atomic clusters in liquid Cu-Sn alloys

    NASA Astrophysics Data System (ADS)

    Jia, Peng; Zhang, Jinyang; Hu, Xun; Li, Cancan; Zhao, Degang; Teng, XinYing; Yang, Cheng

    2018-05-01

    The liquid structure of CuxSn100-x (x = 0, 10, 20, 33, 40, 50, 60, 75, 80 and 100) alloys with atom percentage were investigated with resistivity and viscosity methods. It can be found from the resistivity data that the liquid Cu75Sn25 and Cu80Sn20 alloys had a negative temperature coefficient of resistivity (TCR), and liquid Cu75Sn25 alloy had a minimum value of -9.24 μΩ cm K-1. While the rest of liquid Cu-Sn alloys had a positive TCR. The results indicated that the Cu75Sn25 atomic clusters existed in Cu-Sn alloys. In addition, the method of calculating the percentage of Cu75Sn25 atomic clusters was established on the basis of resistivity theory and the law of conservation of mass. The Cu75Sn25 alloy had a maximum volume of the atomic clusters and a highest activation energy. The results further proved the existence of Cu75Sn25 atomic clusters. Furthermore, the correlation between the liquid structure and the resistivity was established. These results provide a useful reference for the investigation of liquid structure via the sensitive physical properties to the liquid structure.

  3. Structural Analysis of Cubane-Type Iron Clusters

    PubMed Central

    Tan, Lay Ling; Holm, R. H.; Lee, Sonny C.

    2013-01-01

    The generalized cluster type [M4(μ3-Q)4Ln]x contains the cubane-type [M4Q4]z core unit that can approach, but typically deviates from, perfect Td symmetry. The geometric properties of this structure have been analyzed with reference to Td symmetry by a new protocol. Using coordinates of M and Q atoms, expressions have been derived for interatomic separations, bond angles, and volumes of tetrahedral core units (M4, Q4) and the total [M4Q4] core (as a tetracapped M4 tetrahedron). Values for structural parameters have been calculated from observed average values for a given cluster type. Comparison of calculated and observed values measures the extent of deviation of a given parameter from that required in an exact tetrahedral structure. The procedure has been applied to the structures of over 130 clusters containing [Fe4Q4] (Q = S2−, Se2−, Te2−, [NPR3]−, [NR]2−) units, of which synthetic and biological sulfide-bridged clusters constitute the largest subset. General structural features and trends in structural parameters are identified and summarized. An extensive database of structural properties (distances, angles, volumes) has been compiled in Supporting Information. PMID:24072952

  4. Structural Analysis of Cubane-Type Iron Clusters.

    PubMed

    Tan, Lay Ling; Holm, R H; Lee, Sonny C

    2013-07-13

    The generalized cluster type [M 4 (μ 3 -Q) 4 L n ] x contains the cubane-type [M 4 Q 4 ] z core unit that can approach, but typically deviates from, perfect T d symmetry. The geometric properties of this structure have been analyzed with reference to T d symmetry by a new protocol. Using coordinates of M and Q atoms, expressions have been derived for interatomic separations, bond angles, and volumes of tetrahedral core units (M 4 , Q 4 ) and the total [M 4 Q 4 ] core (as a tetracapped M 4 tetrahedron). Values for structural parameters have been calculated from observed average values for a given cluster type. Comparison of calculated and observed values measures the extent of deviation of a given parameter from that required in an exact tetrahedral structure. The procedure has been applied to the structures of over 130 clusters containing [Fe 4 Q 4 ] (Q = S 2- , Se 2- , Te 2- , [NPR 3 ] - , [NR] 2- ) units, of which synthetic and biological sulfide-bridged clusters constitute the largest subset. General structural features and trends in structural parameters are identified and summarized. An extensive database of structural properties (distances, angles, volumes) has been compiled in Supporting Information.

  5. Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

    PubMed

    Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn

    2015-01-01

    Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

  6. Sparsely-distributed organization of face and limb activations in human ventral temporal cortex

    PubMed Central

    Weiner, Kevin S.; Grill-Spector, Kalanit

    2011-01-01

    Functional magnetic resonance imaging (fMRI) has identified face- and body part-selective regions, as well as distributed activation patterns for object categories across human ventral temporal cortex (VTC), eliciting a debate regarding functional organization in VTC and neural coding of object categories. Using high-resolution fMRI, we illustrate that face- and limb-selective activations alternate in a series of largely nonoverlapping clusters in lateral VTC along the inferior occipital gyrus (IOG), fusiform gyrus (FG), and occipitotemporal sulcus (OTS). Both general linear model (GLM) and multivoxel pattern (MVP) analyses show that face- and limb-selective activations minimally overlap and that this organization is consistent across experiments and days. We provide a reliable method to separate two face-selective clusters on the middle and posterior FG (mFus and pFus), and another on the IOG using their spatial relation to limb-selective activations and retinotopic areas hV4, VO-1/2, and hMT+. Furthermore, these activations show a gradient of increasing face selectivity and decreasing limb selectivity from the IOG to the mFus. Finally, MVP analyses indicate that there is differential information for faces in lateral VTC (containing weakly- and highly-selective voxels) relative to non-selective voxels in medial VTC. These findings suggest a sparsely-distributed organization where sparseness refers to the presence of several face- and limb-selective clusters in VTC, and distributed refers to the presence of different amounts of information in highly-, weakly-, and non-selective voxels. Consequently, theories of object recognition should consider the functional and spatial constraints of neural coding across a series of nonoverlapping category-selective clusters that are themselves distributed. PMID:20457261

  7. Author Correction: A Myc enhancer cluster regulates normal and leukaemic haematopoietic stem cell hierarchies.

    PubMed

    Bahr, Carsten; von Paleske, Lisa; Uslu, Veli V; Remeseiro, Silvia; Takayama, Naoya; Ng, Stanley W; Murison, Alex; Langenfeld, Katja; Petretich, Massimo; Scognamiglio, Roberta; Zeisberger, Petra; Benk, Amelie S; Amit, Ido; Zandstra, Peter W; Lupien, Mathieu; Dick, John E; Trumpp, Andreas; Spitz, François

    2018-05-16

    In the originally published version of this Letter, ref. 43 was erroneously provided twice. In the 'Estimation of relative cell-type-specific composition of AML samples' section in the Methods, the citation to ref. 43 after the GEO dataset GSE24759 is correct. However, in the 'Mice' section of the Methods, the citation to ref. 43 after 'TAMERE' should have been associated with a new reference1. The original Letter has been corrected online (with the new reference included as ref. 49).

  8. REACH. Teacher's Guide Volume II. Check Points.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Div. of Vocational Education.

    Designed for use with individualized instructional units (CE 026 345-347, CE 026 349-351) in the REACH (Refrigeration, Electro-Mechanical, Air-Conditioning, Heating) electromechanical cluster, this second volume of the postsecondary teacher guide contains the check points which the instructor may want to refer to when the unit sheet directs the…

  9. Exploring Careers in Marketing and Distribution: A Guide for Teachers.

    ERIC Educational Resources Information Center

    Insko, Merle A.

    One of 11 guides intended for use at the junior high school level of career exploration, the document identifies job families within the marketing and distribution occupational cluster, identifies occupations within each family, and gives suggestions for possible classroom experiences, references, and evaluations, as well as supportive materials.…

  10. Re-conceptualsing Learning Spaces: Developing Capabilities in a High-Tech Small Firm.

    ERIC Educational Resources Information Center

    Macpherson, Allan; Jones, Ossie; Zhang, Michael; Wilson, Alison

    2003-01-01

    A case study of a small high-tech business explains how they created a virtual cluster of innovation through supply networks, enhancing their own learning and facilitating integration of knowledge. This process overcomes limitations to management learning for small companies in isolated regions. (Contains 66 references.) (SK)

  11. Career Information Handbook.

    ERIC Educational Resources Information Center

    Texas State Technical Inst., Waco.

    The handbook is a companion volume to "High School Career Interest and Information Survey" but its use extends to high school counselors, teachers, administrators and their students as an independent reference tool for occupational information. The manual is divided into sections corresponding to the fifteen career clusters identified by the U.S.…

  12. Annotated Bibliography and Summaries of Reference Materials. School Desegregation/Integration Notebook.

    ERIC Educational Resources Information Center

    American Civil Liberties Union, New York, NY.

    This annotated bibliography provides a framework within which questions and answers about the school desegregation process can be formulated and addressed. A glossary of terms dealing with school integration are included. Among these are the following: ability grouping, annexation, bilingual education, clustering, consolidation, de facto and de…

  13. Learning Communities for Curriculum Change: Key Factors in an Educational Change Process in New Zealand

    ERIC Educational Resources Information Center

    Edwards, Frances

    2012-01-01

    Increasingly school change processes are being facilitated through the formation and operation of groups of teachers working together for improved student outcomes. These groupings are variously referred to as networks, networked learning communities, communities of practice, professional learning communities, learning circles or clusters. The…

  14. Orbitally invariant internally contracted multireference unitary coupled cluster theory and its perturbative approximation: theory and test calculations of second order approximation.

    PubMed

    Chen, Zhenhua; Hoffmann, Mark R

    2012-07-07

    A unitary wave operator, exp (G), G(+) = -G, is considered to transform a multiconfigurational reference wave function Φ to the potentially exact, within basis set limit, wave function Ψ = exp (G)Φ. To obtain a useful approximation, the Hausdorff expansion of the similarity transformed effective Hamiltonian, exp (-G)Hexp (G), is truncated at second order and the excitation manifold is limited; an additional separate perturbation approximation can also be made. In the perturbation approximation, which we refer to as multireference unitary second-order perturbation theory (MRUPT2), the Hamiltonian operator in the highest order commutator is approximated by a Mo̸ller-Plesset-type one-body zero-order Hamiltonian. If a complete active space self-consistent field wave function is used as reference, then the energy is invariant under orbital rotations within the inactive, active, and virtual orbital subspaces for both the second-order unitary coupled cluster method and its perturbative approximation. Furthermore, the redundancies of the excitation operators are addressed in a novel way, which is potentially more efficient compared to the usual full diagonalization of the metric of the excited configurations. Despite the loss of rigorous size-extensivity possibly due to the use of a variational approach rather than a projective one in the solution of the amplitudes, test calculations show that the size-extensivity errors are very small. Compared to other internally contracted multireference perturbation theories, MRUPT2 only needs reduced density matrices up to three-body even with a non-complete active space reference wave function when two-body excitations within the active orbital subspace are involved in the wave operator, exp (G). Both the coupled cluster and perturbation theory variants are amenable to large, incomplete model spaces. Applications to some widely studied model systems that can be problematic because of geometry dependent quasidegeneracy, H4, P4, and BeH(2), are performed in order to test the new methods on problems where full configuration interaction results are available.

  15. Risk Profiles for Injurious Falls in People Over 60: A Population-Based Cohort Study

    PubMed Central

    Ek, Stina; Rizzuto, Debora; Fratiglioni, Laura; Johnell, Kristina; Xu, Weili

    2018-01-01

    Abstract Background Although falls in older adults are related to multiple risk factors, these factors have commonly been studied individually. We aimed to identify risk profiles for injurious falls in older adults by detecting clusters of established risk factors and quantifying their impact on fall risk. Methods Participants were 2,566 people, aged 60 years and older, from the population-based Swedish National Study on Aging and Care in Kungsholmen. Injurious falls was defined as hospitalization for or receipt of outpatient care because a fall. Cluster analysis was used to identify aggregation of possible risk factors including chronic diseases, fall-risk increasing drugs (FRIDs), physical and cognitive impairments, and lifestyle-related factors. Associations between the clusters and injurious falls over 3, 5, and 10 years were estimated using flexible parametric survival models. Results Five clusters were identified including: a “healthy”, a “well-functioning with multimorbidity”, a “well-functioning, with multimorbidity and high FRID consumption”, a “physically and cognitively impaired”, and a “disabled” cluster. The risk of injurious falls for all groups was significantly higher than for the first cluster of healthy individuals in the reference category. Hazard ratios (95% confidence intervals) ranged from 1.71 (1.02–2.66) for the second cluster to 12.67 (7.38–21.75) for the last cluster over 3 years of follow-up. The highest risk was observed in the last two clusters with high burden of physical and cognitive impairments. Conclusion Risk factors for injurious fall tend to aggregate, representing different levels of risk for falls. Our findings can be useful to tailor and prioritize clinical and public health interventions. PMID:28605455

  16. Is gender policy related to the gender gap in external cause and circulatory disease mortality? A mixed effects model of 22 OECD countries 1973-2008.

    PubMed

    Backhans, Mona; Burström, Bo; de Leon, Antonio Ponce; Marklund, Staffan

    2012-11-12

    Gender differences in mortality vary widely between countries and over time, but few studies have examined predictors of these variations, apart from smoking. The aim of this study is to investigate the link between gender policy and the gender gap in cause-specific mortality, adjusted for economic factors and health behaviours. 22 OECD countries were followed 1973-2008 and the outcomes were gender gaps in external cause and circulatory disease mortality. A previously found country cluster solution was used, which includes indicators on taxes, parental leave, pensions, social insurances and social services in kind. Male breadwinner countries were made reference group and compared to earner-carer, compensatory breadwinner, and universal citizen countries. Specific policies were also analysed. Mixed effect models were used, where years were the level 1-units, and countries were the level 2-units. Both the earner-carer cluster (ns after adjustment for GDP) and policies characteristic of that cluster are associated with smaller gender differences in external causes, particularly due to an association with increased female mortality. Cluster differences in the gender gap in circulatory disease mortality are the result of a larger relative decrease of male mortality in the compensatory breadwinner cluster and the earner-carer cluster. Policies characteristic of those clusters were however generally related to increased mortality. Results for external cause mortality are in concordance with the hypothesis that women become more exposed to risks of accident and violence when they are economically more active. For circulatory disease mortality, results differ depending on approach--cluster or indicator. Whether cluster differences not explained by specific policies reflect other welfare policies or unrelated societal trends is an open question. Recommendations for further studies are made.

  17. Narcolepsy with and without cataplexy, idiopathic hypersomnia with and without long sleep time: a cluster analysis.

    PubMed

    Šonka, Karel; Šusta, Marek; Billiard, Michel

    2015-02-01

    The successive editions of the International Classification of Sleep Disorders (ICSD) reflect the evolution of the concepts of various sleep disorders. This is particularly the case for central disorders of hypersomnolence, with continuous changes in terminology and divisions of narcolepsy, idiopathic hypersomnia, and recurrent hypersomnia. According to the ICSD 2nd Edition (ICSD-2), narcolepsy with cataplexy (NwithC), narcolepsy without cataplexy (Nw/oC), idiopathic hypersomnia with long sleep time (IHwithLST), and idiopathic hypersomnia without long sleep time (IHw/oLST) are four, well-defined hypersomnias of central origin. However, in the absence of biological markers, doubts have been raised as to the relevance of a division of idiopathic hypersomnia into two forms, and it is not yet clear whether Nw/oC and IHw/oLST are two distinct entities. With this in mind, it was decided to empirically review the ICSD-2 classification by using a hierarchical cluster analysis to see whether this division has some relevance, even though the terms "with long sleep time" and "without long sleep time" are inappropriate. The cluster analysis differentiated three main clusters: Cluster 1, "combined monosymptomatic hypersomnia/narcolepsy type 2" (people initially diagnosed with IHw/oLST and Nw/oC); Cluster 2 "polysymptomatic hypersomnia" (people initially diagnosed with IHwithLST); and Cluster 3, narcolepsy type 1 (people initially diagnosed with NwithC). Cluster analysis confirmed that narcolepsy type 1 and polysymptomatic hypersomnia are independent sleep disorders. People who were initially diagnosed with Nw/oC and IHw/oLST formed a single cluster, referred to as "combined monosymptomatic hypersomnia/narcolepsy type 2." Copyright © 2014 Elsevier B.V. All rights reserved.

  18. WEIGHING GALAXY CLUSTERS WITH GAS. I. ON THE METHODS OF COMPUTING HYDROSTATIC MASS BIAS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lau, Erwin T.; Nagai, Daisuke; Nelson, Kaylea, E-mail: erwin.lau@yale.edu

    2013-11-10

    Mass estimates of galaxy clusters from X-ray and Sunyeav-Zel'dovich observations assume the intracluster gas is in hydrostatic equilibrium with their gravitational potential. However, since galaxy clusters are dynamically active objects whose dynamical states can deviate significantly from the equilibrium configuration, the departure from the hydrostatic equilibrium assumption is one of the largest sources of systematic uncertainties in cluster cosmology. In the literature there have been two methods for computing the hydrostatic mass bias based on the Euler and the modified Jeans equations, respectively, and there has been some confusion about the validity of these two methods. The word 'Jeans' wasmore » a misnomer, which incorrectly implies that the gas is collisionless. To avoid further confusion, we instead refer these methods as 'summation' and 'averaging' methods respectively. In this work, we show that these two methods for computing the hydrostatic mass bias are equivalent by demonstrating that the equation used in the second method can be derived from taking spatial averages of the Euler equation. Specifically, we identify the correspondences of individual terms in these two methods mathematically and show that these correspondences are valid to within a few percent level using hydrodynamical simulations of galaxy cluster formation. In addition, we compute the mass bias associated with the acceleration of gas and show that its contribution is small in the virialized regions in the interior of galaxy clusters, but becomes non-negligible in the outskirts of massive galaxy clusters. We discuss future prospects of understanding and characterizing biases in the mass estimate of galaxy clusters using both hydrodynamical simulations and observations and their implications for cluster cosmology.« less

  19. Weighing Galaxy Clusters with Gas. I. On the Methods of Computing Hydrostatic Mass Bias

    NASA Astrophysics Data System (ADS)

    Lau, Erwin T.; Nagai, Daisuke; Nelson, Kaylea

    2013-11-01

    Mass estimates of galaxy clusters from X-ray and Sunyeav-Zel'dovich observations assume the intracluster gas is in hydrostatic equilibrium with their gravitational potential. However, since galaxy clusters are dynamically active objects whose dynamical states can deviate significantly from the equilibrium configuration, the departure from the hydrostatic equilibrium assumption is one of the largest sources of systematic uncertainties in cluster cosmology. In the literature there have been two methods for computing the hydrostatic mass bias based on the Euler and the modified Jeans equations, respectively, and there has been some confusion about the validity of these two methods. The word "Jeans" was a misnomer, which incorrectly implies that the gas is collisionless. To avoid further confusion, we instead refer these methods as "summation" and "averaging" methods respectively. In this work, we show that these two methods for computing the hydrostatic mass bias are equivalent by demonstrating that the equation used in the second method can be derived from taking spatial averages of the Euler equation. Specifically, we identify the correspondences of individual terms in these two methods mathematically and show that these correspondences are valid to within a few percent level using hydrodynamical simulations of galaxy cluster formation. In addition, we compute the mass bias associated with the acceleration of gas and show that its contribution is small in the virialized regions in the interior of galaxy clusters, but becomes non-negligible in the outskirts of massive galaxy clusters. We discuss future prospects of understanding and characterizing biases in the mass estimate of galaxy clusters using both hydrodynamical simulations and observations and their implications for cluster cosmology.

  20. Microstructure-based modelling of arbitrary deformation histories of filler-reinforced elastomers

    NASA Astrophysics Data System (ADS)

    Lorenz, H.; Klüppel, M.

    2012-11-01

    A physically motivated theory of rubber reinforcement based on filler cluster mechanics is presented considering the mechanical behaviour of quasi-statically loaded elastomeric materials subjected to arbitrary deformation histories. This represents an extension of a previously introduced model describing filler induced stress softening and hysteresis of highly strained elastomers. These effects are referred to the hydrodynamic reinforcement of rubber elasticity due to strain amplification by stiff filler clusters and cyclic breakdown and re-aggregation (healing) of softer, already damaged filler clusters. The theory is first developed for the special case of outer stress-strain cycles with successively increasing maximum strain. In this more simple case, all soft clusters are broken at the turning points of the cycle and the mechanical energy stored in the strained clusters is completely dissipated, i.e. only irreversible stress contributions result. Nevertheless, the description of outer cycles involves already all material parameters of the theory and hence they can be used for a fitting procedure. In the general case of an arbitrary deformation history, the cluster mechanics of the material is complicated due to the fact that not all soft clusters are broken at the turning points of a cycle. For that reason additional reversible stress contributions considering the relaxation of clusters upon retraction have to be taken into account for the description of inner cycles. A special recursive algorithm is developed constituting a frame of the mechanical response of encapsulated inner cycles. Simulation and measurement are found to be in fair agreement for CB and silica filled SBR/BR and EPDM samples, loaded in compression and tension along various deformation histories.

  1. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  2. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling

    PubMed Central

    Shin, Junha; Lee, Insuk

    2015-01-01

    Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049

  3. Atlas - a data warehouse for integrative bioinformatics.

    PubMed

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

    2005-02-21

    We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

  4. Atlas – a data warehouse for integrative bioinformatics

    PubMed Central

    Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

    2005-01-01

    Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693

  5. Protein profile of Beta vulgaris leaf apoplastic fluid and changes induced by Fe deficiency and Fe resupply

    PubMed Central

    Ceballos-Laita, Laura; Gutierrez-Carbonell, Elain; Lattanzio, Giuseppe; Vázquez, Saul; Contreras-Moreira, Bruno; Abadía, Anunciación; Abadía, Javier; López-Millán, Ana-Flor

    2015-01-01

    The fluid collected by direct leaf centrifugation has been used to study the proteome of the sugar beet apoplastic fluid as well as the changes induced by Fe deficiency and Fe resupply to Fe-deficient plants in the protein profile. Plants were grown in Fe-sufficient and Fe-deficient conditions, and Fe resupply was carried out with 45 μM Fe(III)-EDTA for 24 h. Protein extracts of leaf apoplastic fluid were analyzed by two-dimensional isoelectric focusing-SDS-PAGE electrophoresis. Gel image analysis revealed 203 consistent spots, and proteins in 81% of them (164) were identified by nLC-MS/MS using a custom made reference repository of beet protein sequences. When redundant UniProt entries were deleted, a non-redundant leaf apoplastic proteome consisting of 109 proteins was obtained. TargetP and SecretomeP algorithms predicted that 63% of them were secretory proteins. Functional classification of the non-redundant proteins indicated that stress and defense, protein metabolism, cell wall and C metabolism accounted for approximately 75% of the identified proteome. The effects of Fe-deficiency on the leaf apoplast proteome were limited, with only five spots (2.5%) changing in relative abundance, thus suggesting that protein homeostasis in the leaf apoplast fluid is well-maintained upon Fe shortage. The identification of three chitinase isoforms among proteins increasing in relative abundance with Fe-deficiency suggests that one of the few effects of Fe deficiency in the leaf apoplast proteome includes cell wall modifications. Iron resupply to Fe deficient plants changed the relative abundance of 16 spots when compared to either Fe-sufficient or Fe-deficient samples. Proteins identified in these spots can be broadly classified as those responding to Fe-resupply, which included defense and cell wall related proteins, and non-responsive, which are mainly protein metabolism related proteins and whose changes in relative abundance followed the same trend as with Fe-deficiency. PMID:25852707

  6. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    NASA Astrophysics Data System (ADS)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  7. HIV-TRACE (Transmission Cluster Engine): a tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens.

    PubMed

    Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J; Wertheim, Joel O

    2018-01-31

    In modern applications of molecular epidemiology, genetic sequence data are routinely used to identify clusters of transmission in rapidly evolving pathogens, most notably HIV-1. Traditional 'shoeleather' epidemiology infers transmission clusters by tracing chains of partners sharing epidemiological connections (e.g., sexual contact). Here, we present a computational tool for identifying a molecular transmission analog of such clusters: HIV-TRACE (TRAnsmission Cluster Engine). HIV-TRACE implements an approach inspired by traditional epidemiology, by identifying chains of partners whose viral genetic relatedness imply direct or indirect epidemiological connections. Molecular transmission clusters are constructed using codon-aware pairwise alignment to a reference sequence followed by pairwise genetic distance estimation among all sequences. This approach is computationally tractable and is capable of identifying HIV-1 transmission clusters in large surveillance databases comprising tens or hundreds of thousands of sequences in near real time, i.e., on the order of minutes to hours. HIV-TRACE is available at www.hivtrace.org and from github.com/veg/hivtrace, along with the accompanying result visualization module from github.com/veg/hivtrace-viz. Importantly, the approach underlying HIV-TRACE is not limited to the study of HIV-1 and can be applied to study outbreaks and epidemics of other rapidly evolving pathogens. © The Author 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Biosynthetic Genes for the Tetrodecamycin Antibiotics.

    PubMed

    Gverzdys, Tomas; Nodwell, Justin R

    2016-07-15

    We recently described 13-deoxytetrodecamycin, a new member of the tetrodecamycin family of antibiotics. A defining feature of these molecules is the presence of a five-membered lactone called a tetronate ring. By sequencing the genome of a producer strain, Streptomyces sp. strain WAC04657, and searching for a gene previously implicated in tetronate ring formation, we identified the biosynthetic genes responsible for producing 13-deoxytetrodecamycin (the ted genes). Using the ted cluster in WAC04657 as a reference, we found related clusters in three other organisms: Streptomyces atroolivaceus ATCC 19725, Streptomyces globisporus NRRL B-2293, and Streptomyces sp. strain LaPpAH-202. Comparing the four clusters allowed us to identify the cluster boundaries. Genetic manipulation of the cluster confirmed the involvement of the ted genes in 13-deoxytetrodecamycin biosynthesis and revealed several additional molecules produced through the ted biosynthetic pathway, including tetrodecamycin, dihydrotetrodecamycin, and another, W5.9, a novel molecule. Comparison of the bioactivities of these four molecules suggests that they may act through the covalent modification of their target(s). The tetrodecamycins are a distinct subgroup of the tetronate family of secondary metabolites. Little is known about their biosynthesis or mechanisms of action, making them an attractive subject for investigation. In this paper we present the biosynthetic gene cluster for 13-deoxytetrodecamycin in Streptomyces sp. strain WAC04657. We identify related clusters in several other organisms and show that they produce related molecules. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  9. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  10. Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages.

    PubMed

    Elmore, M Holly; McGary, Kriston L; Wisecaver, Jennifer H; Slot, Jason C; Geiser, David M; Sink, Stacy; O'Donnell, Kerry; Rokas, Antonis

    2015-02-06

    Fungi that have the enzymes cyanase and carbonic anhydrase show a limited capacity to detoxify cyanate, a fungicide employed by both plants and humans. Here, we describe a novel two-gene cluster that comprises duplicated cyanase and carbonic anhydrase copies, which we name the CCA gene cluster, trace its evolution across Ascomycetes, and examine the evolutionary dynamics of its spread among lineages of the Fusarium oxysporum species complex (hereafter referred to as the FOSC), a cosmopolitan clade of purportedly clonal vascular wilt plant pathogens. Phylogenetic analysis of fungal cyanase and carbonic anhydrase genes reveals that the CCA gene cluster arose independently at least twice and is now present in three lineages, namely Cochliobolus lunatus, Oidiodendron maius, and the FOSC. Genome-wide surveys within the FOSC indicate that the CCA gene cluster varies in copy number across isolates, is always located on accessory chromosomes, and is absent in FOSC's closest relatives. Phylogenetic reconstruction of the CCA gene cluster in 163 FOSC strains from a wide variety of hosts suggests a recent history of rampant transfers between isolates. We hypothesize that the independent formation of the CCA gene cluster in different fungal lineages and its spread across FOSC strains may be associated with resistance to plant-produced cyanates or to use of cyanate fungicides in agriculture. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. The “UV-route” to Search for Blue Straggler Stars in Globular Clusters: First Results from the HST UV Legacy Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raso, S.; Ferraro, F. R.; Lanzoni, B.

    We used data from the Hubble Space Telescope UV Legacy Survey of Galactic Globular Clusters to select the Blue Straggler Star (BSS) population in four intermediate/high density systems (namely NGC 2808, NGC 6388, NGC 6541, and NGC 7078) through a “UV-guided search.” This procedure consists of using the F275W images in each cluster to construct the master list of detected sources, and then force it to the images acquired in the other filters. Such an approach optimizes the detection of relatively hot stars and allows the detection of a complete sample of BSSs even in the central region of high-densitymore » clusters, because the light from the bright cool giants, which dominates the optical emission in old stellar systems, is sensibly reduced at UV wavelengths. Our UV-guided selections of BSSs have been compared to the samples obtained in previous, optical-driven surveys, clearly demonstrating the efficiency of the UV approach. In each cluster we also measured the parameter A {sup +}, defined as the area enclosed between the cumulative radial distribution of BSSs and that of a reference population, which traces the level of BSS central segregation and the level of dynamical evolution suffered by the system. The values measured for the four clusters studied in this paper nicely fall along the dynamical sequence recently presented for a sample of 25 clusters.« less

  12. Receptor signaling clusters in the immune synapse(in eng)

    DOE PAGES

    Dustin, Michael L.; Groves, Jay T.

    2012-02-23

    Signaling processes between various immune cells involve large-scale spatial reorganization of receptors and signaling molecules within the cell-cell junction. These structures, now collectively referred to as immune synapses, interleave physical and mechanical processes with the cascades of chemical reactions that constitute signal transduction systems. Molecular level clustering, spatial exclusion, and long-range directed transport are all emerging as key regulatory mechanisms. The study of these processes is drawing researchers from physical sciences to join the effort and represents a rapidly growing branch of biophysical chemistry. Furthermore, recent advances in physical and quantitative analyses of signaling within the immune synapses are reviewedmore » here.« less

  13. Receptor signaling clusters in the immune synapse (in eng)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dustin, Michael L.; Groves, Jay T.

    2012-02-23

    Signaling processes between various immune cells involve large-scale spatial reorganization of receptors and signaling molecules within the cell-cell junction. These structures, now collectively referred to as immune synapses, interleave physical and mechanical processes with the cascades of chemical reactions that constitute signal transduction systems. Molecular level clustering, spatial exclusion, and long-range directed transport are all emerging as key regulatory mechanisms. The study of these processes is drawing researchers from physical sciences to join the effort and represents a rapidly growing branch of biophysical chemistry. Furthermore, recent advances in physical and quantitative analyses of signaling within the immune synapses are reviewedmore » here.« less

  14. A parallel-processing approach to computing for the geographic sciences; applications and systems enhancements

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.

  15. Surprising performance for vibrational frequencies of the distinguishable clusters with singles and doubles (DCSD) and MP2.5 approximations

    NASA Astrophysics Data System (ADS)

    Kesharwani, Manoj K.; Sylvetsky, Nitai; Martin, Jan M. L.

    2017-11-01

    We show that the DCSD (distinguishable clusters with all singles and doubles) correlation method permits the calculation of vibrational spectra at near-CCSD(T) quality but at no more than CCSD cost, and with comparatively inexpensive analytical gradients. For systems dominated by a single reference configuration, even MP2.5 is a viable alternative, at MP3 cost. MP2.5 performance for vibrational frequencies is comparable to double hybrids such as DSD-PBEP86-D3BJ, but without resorting to empirical parameters. DCSD is also quite suitable for computing zero-point vibrational energies in computational thermochemistry.

  16. Envri Cluster - a Community-Driven Platform of European Environmental Researcher Infrastructures for Providing Common E-Solutions for Earth Science

    NASA Astrophysics Data System (ADS)

    Asmi, A.; Sorvari, S.; Kutsch, W. L.; Laj, P.

    2017-12-01

    European long-term environmental research infrastructures (often referred as ESFRI RIs) are the core facilities for providing services for scientists in their quest for understanding and predicting the complex Earth system and its functioning that requires long-term efforts to identify environmental changes (trends, thresholds and resilience, interactions and feedbacks). Many of the research infrastructures originally have been developed to respond to the needs of their specific research communities, however, it is clear that strong collaboration among research infrastructures is needed to serve the trans-boundary research requires exploring scientific questions at the intersection of different scientific fields, conducting joint research projects and developing concepts, devices, and methods that can be used to integrate knowledge. European Environmental research infrastructures have already been successfully worked together for many years and have established a cluster - ENVRI cluster - for their collaborative work. ENVRI cluster act as a collaborative platform where the RIs can jointly agree on the common solutions for their operations, draft strategies and policies and share best practices and knowledge. Supporting project for the ENVRI cluster, ENVRIplus project, brings together 21 European research infrastructures and infrastructure networks to work on joint technical solutions, data interoperability, access management, training, strategies and dissemination efforts. ENVRI cluster act as one stop shop for multidisciplinary RI users, other collaborative initiatives, projects and programmes and coordinates and implement jointly agreed RI strategies.

  17. Merging symmetry projection methods with coupled cluster theory: Lessons from the Lipkin model Hamiltonian

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wahlen-Strothman, J. M.; Henderson, T. H.; Hermes, M. R.

    Coupled cluster and symmetry projected Hartree-Fock are two central paradigms in electronic structure theory. However, they are very different. Single reference coupled cluster is highly successful for treating weakly correlated systems, but fails under strong correlation unless one sacrifices good quantum numbers and works with broken-symmetry wave functions, which is unphysical for finite systems. Symmetry projection is effective for the treatment of strong correlation at the mean-field level through multireference non-orthogonal configuration interaction wavefunctions, but unlike coupled cluster, it is neither size extensive nor ideal for treating dynamic correlation. We here examine different scenarios for merging these two dissimilar theories.more » We carry out this exercise over the integrable Lipkin model Hamiltonian, which despite its simplicity, encompasses non-trivial physics for degenerate systems and can be solved via diagonalization for a very large number of particles. We show how symmetry projection and coupled cluster doubles individually fail in different correlation limits, whereas models that merge these two theories are highly successful over the entire phase diagram. Despite the simplicity of the Lipkin Hamiltonian, the lessons learned in this work will be useful for building an ab initio symmetry projected coupled cluster theory that we expect to be accurate in the weakly and strongly correlated limits, as well as the recoupling regime.« less

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.

    Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less

  19. Combining symmetry collective states with coupled-cluster theory: Lessons from the Agassi model Hamiltonian

    NASA Astrophysics Data System (ADS)

    Hermes, Matthew R.; Dukelsky, Jorge; Scuseria, Gustavo E.

    2017-06-01

    The failures of single-reference coupled-cluster theory for strongly correlated many-body systems is flagged at the mean-field level by the spontaneous breaking of one or more physical symmetries of the Hamiltonian. Restoring the symmetry of the mean-field determinant by projection reveals that coupled-cluster theory fails because it factorizes high-order excitation amplitudes incorrectly. However, symmetry-projected mean-field wave functions do not account sufficiently for dynamic (or weak) correlation. Here we pursue a merger of symmetry projection and coupled-cluster theory, following previous work along these lines that utilized the simple Lipkin model system as a test bed [J. Chem. Phys. 146, 054110 (2017), 10.1063/1.4974989]. We generalize the concept of a symmetry-projected mean-field wave function to the concept of a symmetry projected state, in which the factorization of high-order excitation amplitudes in terms of low-order ones is guided by symmetry projection and is not exponential, and combine them with coupled-cluster theory in order to model the ground state of the Agassi Hamiltonian. This model has two separate channels of correlation and two separate physical symmetries which are broken under strong correlation. We show how the combination of symmetry collective states and coupled-cluster theory is effective in obtaining correlation energies and order parameters of the Agassi model throughout its phase diagram.

  20. Impact of continuing medical education in cancer diagnosis on GP knowledge, attitude and readiness to investigate - a before-after study.

    PubMed

    Toftegaard, Berit Skjødeberg; Bro, Flemming; Falborg, Alina Zalounina; Vedsted, Peter

    2016-07-26

    Continuing medical education (CME) in earlier cancer diagnosis was launched in Denmark in 2012 as part of the Danish National Cancer Plan. The CME programme was introduced to improve the recognition among general practitioners (GPs) of symptoms suggestive of cancer and improve the selection of patients requiring urgent investigation. This study aims to explore the effect of CME on GP knowledge about cancer diagnosis, attitude towards own role in cancer detection, self-assessed readiness to investigate and cancer risk assessment of urgently referred patients. We conducted a before-after study in the Central Denmark Region including 831 GPs assigned to one of eight geographical clusters. All GPs were invited to participate in the CME at three-week intervals between clusters. A questionnaire focusing on knowledge, attitude and clinical vignettes was sent to each GP one month before and seven months after the CME. The GPs were also asked to assess the risk of cancer in patients urgently referred to a fast-track cancer pathway during an eight-month period. CME-participating GPs were compared with reference (non-participating) GPs by analysing before-after differences. One quarter of all GPs participated in the CME. 202 GPs (24.3 %) completed both the baseline and the follow-up questionnaires. 532 GPs (64.0 %) assessed the risk of cancer before the CME and 524 GPs (63.1 %) assessed the risk of cancer after the CME in urgently referred consecutive patients. Compared to the reference group, CME-participating GPs statistically significantly improved their understanding of a rational probability of diagnosing cancer among patients urgently referred for suspected cancer, increased their knowledge of cancer likelihood in a 50-year-old referred patient and lowered the assessed risk of cancer in urgently referred patients. The standardised CME lowered the GP-assessed cancer risk of urgently referred patients, whereas the effect on knowledge about cancer diagnosis and attitude towards own role in cancer detection was limited. No effect was found on the GPs' readiness to investigate. CME may be effective for optimising the interpretation of cancer symptoms and thereby improve the selection of patients for urgent cancer referral. NCT02069470 on ClinicalTrials.gov. Retrospectively registered, 1/29/2014.

  1. The XXL Survey: First Results and Future

    NASA Technical Reports Server (NTRS)

    Pierre, M.; Adami, C.; Birkinshaw, M.; Chiappetti, L.; Ettori, S.; Evrard, A.; Faccioli, L.; Gastaldello, F.; Giles, P.; Horellou, C.; hide

    2017-01-01

    The XXL survey currently covers two 25 deg2 patches with XMM observations of approximately 10 ks. We summarize the scientific results associated with the first release of the XXL dataset, which occurred in mid-2016.We review several arguments for increasing the survey depth to 40 ks during the next decade of XMM operations. X-ray(zeta less than 2) cluster, (zeta less than 4) active galactic nuclei (AGN), and cosmic background survey science will then benefit from an extraordinary data reservoir. This, combined with deep multi-lambda observations, will lead to solid standalone cosmological constraints and provide a wealth of information on the formation and evolution of AGN, clusters, and the X-ray background. In particular, it will offer a unique opportunity to pinpoint the zeta greater than1 cluster density. It will eventually constitute a reference study and an ideal calibration field for the upcoming eROSITA and Euclid missions.

  2. Method for Continuous Monitoring of Electrospray Ion Formation

    NASA Astrophysics Data System (ADS)

    Metzler, Guille; Crathern, Susan; Bachmann, Lorin; Fernández-Metzler, Carmen; King, Richard

    2017-10-01

    A method for continuously monitoring the performance of electrospray ionization without the addition of hardware or chemistry to the system is demonstrated. In the method, which we refer to as SprayDx, cluster ions with solvent vapor natively formed by electrospray are followed throughout the collection of liquid chromatography-selected reaction monitoring data. The cluster ion extracted ion chromatograms report on the consistency of the ion formation and detection system. The data collected by the SprayDx method resemble the data collected for postcolumn infusion of analyte. The response of the cluster ions monitored reports on changes in the physical parameters of the ion source such as voltage and gas flow. SprayDx is also observed to report on ion suppression in a fashion very similar to a postcolumn infusion of analyte. We anticipate the method finding utility as a continuous readout on the performance of electrospray and other atmospheric pressure ionization processes. [Figure not available: see fulltext.

  3. Honey bee-inspired algorithms for SNP haplotype reconstruction problem

    NASA Astrophysics Data System (ADS)

    PourkamaliAnaraki, Maryam; Sadeghi, Mehdi

    2016-03-01

    Reconstructing haplotypes from SNP fragments is an important problem in computational biology. There have been a lot of interests in this field because haplotypes have been shown to contain promising data for disease association research. It is proved that haplotype reconstruction in Minimum Error Correction model is an NP-hard problem. Therefore, several methods such as clustering techniques, evolutionary algorithms, neural networks and swarm intelligence approaches have been proposed in order to solve this problem in appropriate time. In this paper, we have focused on various evolutionary clustering techniques and try to find an efficient technique for solving haplotype reconstruction problem. It can be referred from our experiments that the clustering methods relying on the behaviour of honey bee colony in nature, specifically bees algorithm and artificial bee colony methods, are expected to result in more efficient solutions. An application program of the methods is available at the following link. http://www.bioinf.cs.ipm.ir/software/haprs/

  4. Kinematic evidence of satellite galaxy populations in the potential wells of first-ranked cluster galaxies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cowie, L.L.; Hu, E.M.

    1986-06-01

    The velocities of 38 centrally positioned galaxies (r much less than 100 kpc) were measured relative to the velocity of the first-ranked galaxy in 14 rich clusters. Analysis of the velocity distribution function of this sample and of previous data shows that the population cannot be fit by a single Gaussian. An adequate fit is obtained if 60 percent of the objects lie in a Gaussian with sigma = 250 km/s and the remainder in a population with sigma = 1400 km/s. All previous data sets are individually consistent with this conclusion. This suggests that there is a bound populationmore » of galaxies in the potential well of the central galaxy in addition to the normal population of the cluster core. This is taken as supporting evidence for the galactic cannibalism model of cD galaxy formation. 14 references.« less

  5. Referring patients to specialists: A structured vignette survey of Australian and British GPs

    PubMed Central

    Jiwa, Moyez; Gordon, Michael; Arnet, Hayley; Ee, Hooi; Bulsara, Max; Colwell, Brigitte

    2008-01-01

    Background In Australia and in the United Kingdom (UK) access to specialists is sanctioned by General Practitioners (GPs). It is important to understand how practitioners determine which patients warrant referral. Methods A self-administered structured vignette postal survey of General Practitioners in Western Australia and the United Kingdom. Sixty-four vignettes describing patients with colorectal symptoms were constructed encompassing six clinical details. Nine vignettes, chosen at random, were presented to each individual. Respondents were asked if they would refer the patient to a specialist and how urgently. Logistic regression and parametric tests were used to analyse the data Results We received 260 completed questionnaires. 58% of 'cancer vignettes' were selected for 'urgent' referral. 1632/2367 or 69% of all vignettes were selected for referral. After adjusting for clustering the model suggests that 38.4% of the variability is explained by all the clinical variables as well as the age and experience of the respondents. 1012 or 42.8 % of vignettes were referred 'urgently'. After adjusting for clustering the data suggests that 31.3 % of the variability is explained by the model. The age of the respondents, the location of the practice and all the clinical variables were significant in the decision to refer urgently. Conclusion GPs' referral decisions for patients with lower bowel symptoms are similar in the two countries. We question the wisdom of streaming referrals from primary care without a strong evidence base and an effective intervention for implementing guidelines. We conclude that implementation must take into account the profile of patients but also the characteristics of GPs and referral policies. PMID:18194578

  6. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  7. Mapping Health Data: Improved Privacy Protection With Donut Method Geomasking

    PubMed Central

    Hampton, Kristen H.; Fitch, Molly K.; Allshouse, William B.; Doherty, Irene A.; Gesink, Dionne C.; Leone, Peter A.; Serre, Marc L.; Miller, William C.

    2010-01-01

    A major challenge in mapping health data is protecting patient privacy while maintaining the spatial resolution necessary for spatial surveillance and outbreak identification. A new adaptive geomasking technique, referred to as the donut method, extends current methods of random displacement by ensuring a user-defined minimum level of geoprivacy. In donut method geomasking, each geocoded address is relocated in a random direction by at least a minimum distance, but less than a maximum distance. The authors compared the donut method with current methods of random perturbation and aggregation regarding measures of privacy protection and cluster detection performance by masking multiple disease field simulations under a range of parameters. Both the donut method and random perturbation performed better than aggregation in cluster detection measures. The performance of the donut method in geoprivacy measures was at least 42.7% higher and in cluster detection measures was less than 4.8% lower than that of random perturbation. Results show that the donut method provides a consistently higher level of privacy protection with a minimal decrease in cluster detection performance, especially in areas where the risk to individual geoprivacy is greatest. PMID:20817785

  8. Mapping health data: improved privacy protection with donut method geomasking.

    PubMed

    Hampton, Kristen H; Fitch, Molly K; Allshouse, William B; Doherty, Irene A; Gesink, Dionne C; Leone, Peter A; Serre, Marc L; Miller, William C

    2010-11-01

    A major challenge in mapping health data is protecting patient privacy while maintaining the spatial resolution necessary for spatial surveillance and outbreak identification. A new adaptive geomasking technique, referred to as the donut method, extends current methods of random displacement by ensuring a user-defined minimum level of geoprivacy. In donut method geomasking, each geocoded address is relocated in a random direction by at least a minimum distance, but less than a maximum distance. The authors compared the donut method with current methods of random perturbation and aggregation regarding measures of privacy protection and cluster detection performance by masking multiple disease field simulations under a range of parameters. Both the donut method and random perturbation performed better than aggregation in cluster detection measures. The performance of the donut method in geoprivacy measures was at least 42.7% higher and in cluster detection measures was less than 4.8% lower than that of random perturbation. Results show that the donut method provides a consistently higher level of privacy protection with a minimal decrease in cluster detection performance, especially in areas where the risk to individual geoprivacy is greatest.

  9. Recent development in deciphering the structure of luminescent silver nanodots

    NASA Astrophysics Data System (ADS)

    Choi, Sungmoon; Yu, Junhua

    2017-05-01

    Matrix-stabilized silver clusters and stable luminescent few-atom silver clusters, referred to as silver nanodots, show notable difference in their photophysical properties. We present recent research on deciphering the nature of silver clusters and nanodots and understanding the factors that lead to variations in luminescent mechanisms. Due to their relatively simple structure, the matrix-stabilized clusters have been well studied. However, the single-stranded DNA (ssDNA)-stabilized silver nanodots that show the most diverse emission wavelengths and the best photophysical properties remain mysterious species. It is clear that their photophysical properties highly depend on their protection scaffolds. Analyses from combinations of high-performance liquid chromatography, inductively coupled plasma-atomic emission spectroscopy, electrophoresis, and mass spectrometry indicate that about 10 to 20 silver atoms form emissive complexes with ssDNA. However, it is possible that not all of the silver atoms in the complex form effective emission centers. Investigation of the nanodot structure will help us understand why luminescent silver nanodots are stable in aqueous solution and how to further improve their chemical and photophysical properties.

  10. Identification of Staphylococcus spp. using (GTG)₅-PCR fingerprinting.

    PubMed

    Svec, Pavel; Pantůček, Roman; Petráš, Petr; Sedláček, Ivo; Nováková, Dana

    2010-12-01

    A group of 212 type and reference strains deposited in the Czech Collection of Microorganisms (Brno, Czech Republic) and covering 41 Staphylococcus species comprising 21 subspecies was characterised using rep-PCR fingerprinting with the (GTG)₅ primer in order to evaluate this method for identification of staphylococci. All strains were typeable using the (GTG)₅ primer and generated PCR products ranging from 200 to 4500 bp. Numerical analysis of the obtained fingerprints revealed (sub)species-specific clustering corresponding with the taxonomic position of analysed strains. Taxonomic position of selected strains representing the (sub)species that were distributed over multiple rep-PCR clusters was verified and confirmed by the partial rpoB gene sequencing. Staphylococcus caprae, Staphylococcus equorum, Staphylococcus sciuri, Staphylococcus piscifermentans, Staphylococcus xylosus, and Staphylococcus saprophyticus revealed heterogeneous fingerprints and each (sub)species was distributed over several clusters. However, representatives of the remaining Staphylococcus spp. were clearly separated in single (sub)species-specific clusters. These results showed rep-PCR with the (GTG)₅ primer as a fast and reliable method applicable for differentiation and straightforward identification of majority of Staphylococcus spp. Copyright © 2010 Elsevier GmbH. All rights reserved.

  11. Construction and comparative evaluation of different activity detection methods in brain FDG-PET.

    PubMed

    Buchholz, Hans-Georg; Wenzel, Fabian; Gartenschläger, Martin; Thiele, Frank; Young, Stewart; Reuss, Stefan; Schreckenberger, Mathias

    2015-08-18

    We constructed and evaluated reference brain FDG-PET databases for usage by three software programs (Computer-aided diagnosis for dementia (CAD4D), Statistical Parametric Mapping (SPM) and NEUROSTAT), which allow a user-independent detection of dementia-related hypometabolism in patients' brain FDG-PET. Thirty-seven healthy volunteers were scanned in order to construct brain FDG reference databases, which reflect the normal, age-dependent glucose consumption in human brain, using either software. Databases were compared to each other to assess the impact of different stereotactic normalization algorithms used by either software package. In addition, performance of the new reference databases in the detection of altered glucose consumption in the brains of patients was evaluated by calculating statistical maps of regional hypometabolism in FDG-PET of 20 patients with confirmed Alzheimer's dementia (AD) and of 10 non-AD patients. Extent (hypometabolic volume referred to as cluster size) and magnitude (peak z-score) of detected hypometabolism was statistically analyzed. Differences between the reference databases built by CAD4D, SPM or NEUROSTAT were observed. Due to the different normalization methods, altered spatial FDG patterns were found. When analyzing patient data with the reference databases created using CAD4D, SPM or NEUROSTAT, similar characteristic clusters of hypometabolism in the same brain regions were found in the AD group with either software. However, larger z-scores were observed with CAD4D and NEUROSTAT than those reported by SPM. Better concordance with CAD4D and NEUROSTAT was achieved using the spatially normalized images of SPM and an independent z-score calculation. The three software packages identified the peak z-scores in the same brain region in 11 of 20 AD cases, and there was concordance between CAD4D and SPM in 16 AD subjects. The clinical evaluation of brain FDG-PET of 20 AD patients with either CAD4D-, SPM- or NEUROSTAT-generated databases from an identical reference dataset showed similar patterns of hypometabolism in the brain regions known to be involved in AD. The extent of hypometabolism and peak z-score appeared to be influenced by the calculation method used in each software package rather than by different spatial normalization parameters.

  12. Monitoring by Use of Clusters of Sensor-Data Vectors

    NASA Technical Reports Server (NTRS)

    Iverson, David L.

    2007-01-01

    The inductive monitoring system (IMS) is a system of computer hardware and software for automated monitoring of the performance, operational condition, physical integrity, and other aspects of the health of a complex engineering system (e.g., an industrial process line or a spacecraft). The input to the IMS consists of streams of digitized readings from sensors in the monitored system. The IMS determines the type and amount of any deviation of the monitored system from a nominal or normal ( healthy ) condition on the basis of a comparison between (1) vectors constructed from the incoming sensor data and (2) corresponding vectors in a database of nominal or normal behavior. The term inductive reflects the use of a process reminiscent of traditional mathematical induction to learn about normal operation and build the nominal-condition database. The IMS offers two major advantages over prior computational monitoring systems: The computational burden of the IMS is significantly smaller, and there is no need for abnormal-condition sensor data for training the IMS to recognize abnormal conditions. The figure schematically depicts the relationships among the computational processes effected by the IMS. Training sensor data are gathered during normal operation of the monitored system, detailed computational simulation of operation of the monitored system, or both. The training data are formed into vectors that are used to generate the database. The vectors in the database are clustered into regions that represent normal or nominal operation. Once the database has been generated, the IMS compares the vectors of incoming sensor data with vectors representative of the clusters. The monitored system is deemed to be operating normally or abnormally, depending on whether the vector of incoming sensor data is or is not, respectively, sufficiently close to one of the clusters. For this purpose, a distance between two vectors is calculated by a suitable metric (e.g., Euclidean distance) and "sufficiently close" signifies lying at a distance less than a specified threshold value. It must be emphasized that although the IMS is intended to detect off-nominal or abnormal performance or health, it is not necessarily capable of performing a thorough or detailed diagnosis. Limited diagnostic information may be available under some circumstances. For example, the distance of a vector of incoming sensor data from the nearest cluster could serve as an indication of the severity of a malfunction. The identity of the nearest cluster may be a clue as to the identity of the malfunctioning component or subsystem. It is possible to decrease the IMS computation time by use of a combination of cluster-indexing and -retrieval methods. For example, in one method, the distances between each cluster and two or more reference vectors can be used for the purpose of indexing and retrieval. The clusters are sorted into a list according to these distance values, typically in ascending order of distance. When a set of input data arrives and is to be tested, the data are first arranged as an ordered set (that is, a vector). The distances from the input vector to the reference points are computed. The search of clusters from the list can then be limited to those clusters lying within a certain distance range from the input vector; the computation time is reduced by not searching the clusters at a greater distance.

  13. The quataron concept: a key to solve the problem of the nanostate

    NASA Astrophysics Data System (ADS)

    Askhabov, A. M.

    2003-04-01

    In a number of our works (Askhabov, 1998-2002) we have described a set of ideas and principles dealing with structural organization of substance in the nanorange and its role for formation of crystalline and noncrystalline materials. These ideas have been collectively referred to as the “quataron concept”. Central in this new concept is the idea that there are specific nanosize clusters arising under non-equilibrium conditions. These clusters are understood as a peculiar form of structural organization of substance at the nanolevel and referred to as "hidden" phase clusters or quatarons. As inequilibrium objects, quatarons are capable of self-organization and self-development. With their valencies fully realized (in covalent interactions), they can become large molecules; with a three-dimensional ordering (atom arrangement in a crystal lattice) they will produce crystalline particles. Quatarons are the basis for all kinds of equilibrium nanostructures from ordinary tetra- and octahedral groupings to the widely known fullerenes or dense dodecahedral and icosahedral clusters, colloidal, fractal particles. In particular, the quataron theory offers a very simple solution to the fullerene problem. Quatarons are fullerene predecessors. The fullerene architecture is dictated by hollow quatarons. Besides, it has been found that only clusters more than ~1.2 nm in size can become potenial centers of crystallization. Thus, quatarons seem to be behind all the rest of nanoparticles, including nanocrystals. This theory also broadens our understanding of the amorphous state. If for some reason quatarons or their aggregates fail to crystallize, for example, as a result of the fractal structure of the cluster surface or owing to their non-crystallographic (icosahedral) shape, then in the condensed state they give rise to a special class of solid ultradisperse materials (quatarites) of various degrees of ordering. The closest analogue of such materials is opal, a material made up of one-size spherical silica particles. A well-ordered material composed of carbon fullerenes is known as fullerite. The quataron concept will produce a profound effect on the mineralogical science, physics and chemistry of minerals. Already now we have obviously reached the point where we need to revise some of the fundamental genetic, structural and classificational issues. In particular, what was said above about the structure and formation of noncrystalline materials dictates the necessity of a broader understanding of the mineral. This would result in that a large number of materials now referred to as mineraloids will fall into the area of minerals and will be considered as new mineral species, which would mean that minerals are not only natural objects (chemical compounds) of crystalline structure but also X-ray amorphous solids of certain arrangement of elements (fullerites, quatarites, opals, etc.). The work was done with financial support from RFBI (grant N. 02-05-64688) and INTAS (grant N. 99-0247).

  14. Periodic Methods for Controlling a Satellite in Formation

    DTIC Science & Technology

    2002-03-01

    5 5. Clohessy - Wiltshire Reference Frame................................................................... 10 6...techniques to study relative position errors within a satellite cluster [19, 24]. The dynamics were based on Clohessy - Wiltshire equations with near...dynamics model by solving the time periodic, linearized system using Floquet Theory. More accurate than the Clohessy - Wiltshire solutions used in previous

  15. Relativistic equation-of-motion coupled-cluster method using open-shell reference wavefunction: Application to ionization potential

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pathak, Himadri, E-mail: hmdrpthk@gmail.com; Sasmal, Sudip, E-mail: sudipsasmal.chem@gmail.com; Vaval, Nayana

    2016-08-21

    The open-shell reference relativistic equation-of-motion coupled-cluster method within its four-component description is successfully implemented with the consideration of single- and double- excitation approximations using the Dirac-Coulomb Hamiltonian. At the first attempt, the implemented method is employed to calculate ionization potential value of heavy atomic (Ag, Cs, Au, Fr, and Lr) and molecular (HgH and PbF) systems, where the effect of relativity does really matter to obtain highly accurate results. Not only the relativistic effect but also the effect of electron correlation is crucial in these heavy atomic and molecular systems. To justify the fact, we have taken two further approximationsmore » in the four-component relativistic equation-of-motion framework to quantify how the effect of electron correlation plays a role in the calculated values at different levels of theory. All these calculated results are compared with the available experimental data as well as with other theoretically calculated values to judge the extent of accuracy obtained in our calculations.« less

  16. A Radio-Map Automatic Construction Algorithm Based on Crowdsourcing

    PubMed Central

    Yu, Ning; Xiao, Chenxian; Wu, Yinfeng; Feng, Renjian

    2016-01-01

    Traditional radio-map-based localization methods need to sample a large number of location fingerprints offline, which requires huge amount of human and material resources. To solve the high sampling cost problem, an automatic radio-map construction algorithm based on crowdsourcing is proposed. The algorithm employs the crowd-sourced information provided by a large number of users when they are walking in the buildings as the source of location fingerprint data. Through the variation characteristics of users’ smartphone sensors, the indoor anchors (doors) are identified and their locations are regarded as reference positions of the whole radio-map. The AP-Cluster method is used to cluster the crowdsourced fingerprints to acquire the representative fingerprints. According to the reference positions and the similarity between fingerprints, the representative fingerprints are linked to their corresponding physical locations and the radio-map is generated. Experimental results demonstrate that the proposed algorithm reduces the cost of fingerprint sampling and radio-map construction and guarantees the localization accuracy. The proposed method does not require users’ explicit participation, which effectively solves the resource-consumption problem when a location fingerprint database is established. PMID:27070623

  17. Quadratic canonical transformation theory and higher order density matrices.

    PubMed

    Neuscamman, Eric; Yanai, Takeshi; Chan, Garnet Kin-Lic

    2009-03-28

    Canonical transformation (CT) theory provides a rigorously size-extensive description of dynamic correlation in multireference systems, with an accuracy superior to and cost scaling lower than complete active space second order perturbation theory. Here we expand our previous theory by investigating (i) a commutator approximation that is applied at quadratic, as opposed to linear, order in the effective Hamiltonian, and (ii) incorporation of the three-body reduced density matrix in the operator and density matrix decompositions. The quadratic commutator approximation improves CT's accuracy when used with a single-determinant reference, repairing the previous formal disadvantage of the single-reference linear CT theory relative to singles and doubles coupled cluster theory. Calculations on the BH and HF binding curves confirm this improvement. In multireference systems, the three-body reduced density matrix increases the overall accuracy of the CT theory. Tests on the H(2)O and N(2) binding curves yield results highly competitive with expensive state-of-the-art multireference methods, such as the multireference Davidson-corrected configuration interaction (MRCI+Q), averaged coupled pair functional, and averaged quadratic coupled cluster theories.

  18. SAMSA2: a standalone metatranscriptome analysis pipeline.

    PubMed

    Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G

    2018-05-21

    Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

  19. To what do psychiatric diagnoses refer? A two-dimensional semantic analysis of diagnostic terms

    PubMed Central

    Maung, Hane Htut

    2016-01-01

    In somatic medicine, diagnostic terms often refer to the disease processes that are the causes of patients' symptoms. The language used in some clinical textbooks and health information resources suggests that this is also sometimes assumed to be the case with diagnoses in psychiatry. However, this seems to be in tension with the ways in which psychiatric diagnoses are defined in diagnostic manuals, according to which they refer solely to clusters of symptoms. This paper explores how theories of reference in the philosophy of language can help to resolve this tension. After the evaluation of descriptive and causal theories of reference, I put forward a conceptual framework based on two-dimensional semantics that allows the causal analysis of diagnostic terms in psychiatry, while taking seriously their descriptive definitions in diagnostic manuals. While the framework is presented as a solution to a problem regarding the semantics of psychiatric diagnoses, it can also accommodate the analysis of diagnostic terms in other medical disciplines. PMID:26580354

  20. Deviation from equilibrium conditions in molecular dynamic simulations of homogeneous nucleation.

    PubMed

    Halonen, Roope; Zapadinsky, Evgeni; Vehkamäki, Hanna

    2018-04-28

    We present a comparison between Monte Carlo (MC) results for homogeneous vapour-liquid nucleation of Lennard-Jones clusters and previously published values from molecular dynamics (MD) simulations. Both the MC and MD methods sample real cluster configuration distributions. In the MD simulations, the extent of the temperature fluctuation is usually controlled with an artificial thermostat rather than with more realistic carrier gas. In this study, not only a primarily velocity scaling thermostat is considered, but also Nosé-Hoover, Berendsen, and stochastic Langevin thermostat methods are covered. The nucleation rates based on a kinetic scheme and the canonical MC calculation serve as a point of reference since they by definition describe an equilibrated system. The studied temperature range is from T = 0.3 to 0.65 ϵ/k. The kinetic scheme reproduces well the isothermal nucleation rates obtained by Wedekind et al. [J. Chem. Phys. 127, 064501 (2007)] using MD simulations with carrier gas. The nucleation rates obtained by artificially thermostatted MD simulations are consistently lower than the reference nucleation rates based on MC calculations. The discrepancy increases up to several orders of magnitude when the density of the nucleating vapour decreases. At low temperatures, the difference to the MC-based reference nucleation rates in some cases exceeds the maximal nonisothermal effect predicted by classical theory of Feder et al. [Adv. Phys. 15, 111 (1966)].

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bevelhimer, Mark S.; Adams, Marshall; Fortner, Allison M.

    The effect of coal ash exposure on fish health in freshwater communities is largely unknown. Given the large number of possible pathways of effects (e.g., toxicological effect of exposure to multiple metals, physical effects from ash exposure, and food web effects), measurement of only a few health metrics is not likely to give a complete picture. The authors measured a suite of 20 health metrics from 1100+ fish collected from 5 sites (3 affected and 2 reference) near a coal ash spill in east Tennessee over a 4.5-yr period. The metrics represented a wide range of physiological and energetic responsesmore » and were evaluated simultaneously using 2 multivariate techniques. Results from both hierarchical clustering and canonical discriminant analyses suggested that for most speciesXseason combinations, the suite of fish health indicators varied more among years than between spill and reference sites within a year. In a few cases, spill sites from early years in the investigation stood alone or clustered together separate from reference sites and later year spill sites. Outlier groups of fish with relatively unique health profiles were most often from spill sites, suggesting that some response to the ash exposure may have occurred. Results from the 2 multivariate methods suggest that any change in the health status of fish at the spill sites was small and appears to have diminished since the first 2 to 3 yr after the spill.« less

  2. Deviation from equilibrium conditions in molecular dynamic simulations of homogeneous nucleation

    NASA Astrophysics Data System (ADS)

    Halonen, Roope; Zapadinsky, Evgeni; Vehkamäki, Hanna

    2018-04-01

    We present a comparison between Monte Carlo (MC) results for homogeneous vapour-liquid nucleation of Lennard-Jones clusters and previously published values from molecular dynamics (MD) simulations. Both the MC and MD methods sample real cluster configuration distributions. In the MD simulations, the extent of the temperature fluctuation is usually controlled with an artificial thermostat rather than with more realistic carrier gas. In this study, not only a primarily velocity scaling thermostat is considered, but also Nosé-Hoover, Berendsen, and stochastic Langevin thermostat methods are covered. The nucleation rates based on a kinetic scheme and the canonical MC calculation serve as a point of reference since they by definition describe an equilibrated system. The studied temperature range is from T = 0.3 to 0.65 ɛ/k. The kinetic scheme reproduces well the isothermal nucleation rates obtained by Wedekind et al. [J. Chem. Phys. 127, 064501 (2007)] using MD simulations with carrier gas. The nucleation rates obtained by artificially thermostatted MD simulations are consistently lower than the reference nucleation rates based on MC calculations. The discrepancy increases up to several orders of magnitude when the density of the nucleating vapour decreases. At low temperatures, the difference to the MC-based reference nucleation rates in some cases exceeds the maximal nonisothermal effect predicted by classical theory of Feder et al. [Adv. Phys. 15, 111 (1966)].

  3. Quality of Computationally Inferred Gene Ontology Annotations

    PubMed Central

    Škunca, Nives; Altenhoff, Adrian; Dessimoz, Christophe

    2012-01-01

    Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. PMID:22693439

  4. Kalium: a database of potassium channel toxins from scorpion venom.

    PubMed

    Kuzmenkov, Alexey I; Krylov, Nikolay A; Chugunov, Anton O; Grishin, Eugene V; Vassilevski, Alexander A

    2016-01-01

    Kalium (http://kaliumdb.org/) is a manually curated database that accumulates data on potassium channel toxins purified from scorpion venom (KTx). This database is an open-access resource, and provides easy access to pages of other databases of interest, such as UniProt, PDB, NCBI Taxonomy Browser, and PubMed. General achievements of Kalium are a strict and easy regulation of KTx classification based on the unified nomenclature supported by researchers in the field, removal of peptides with partial sequence and entries supported by transcriptomic information only, classification of β-family toxins, and addition of a novel λ-family. Molecules presented in the database can be processed by the Clustal Omega server using a one-click option. Molecular masses of mature peptides are calculated and available activity data are compiled for all KTx. We believe that Kalium is not only of high interest to professional toxinologists, but also of general utility to the scientific community.Database URL:http://kaliumdb.org/. © The Author(s) 2016. Published by Oxford University Press.

  5. Gene Ontology annotations at SGD: new data sources and annotation methods

    PubMed Central

    Hong, Eurie L.; Balakrishnan, Rama; Dong, Qing; Christie, Karen R.; Park, Julie; Binkley, Gail; Costanzo, Maria C.; Dwight, Selina S.; Engel, Stacia R.; Fisk, Dianna G.; Hirschman, Jodi E.; Hitz, Benjamin C.; Krieger, Cynthia J.; Livstone, Michael S.; Miyasato, Stuart R.; Nash, Robert S.; Oughtred, Rose; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Zhu, Kathy K.; Dolinski, Kara; Botstein, David; Cherry, J. Michael

    2008-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current. PMID:17982175

  6. Measurement of the spectral signature of small carbon clusters at near and far infrared wavelengths

    NASA Technical Reports Server (NTRS)

    Tarter, J.; Saykally, R.

    1991-01-01

    A significant percentage of the carbon inventory of the circumstellar and interstellar media may be in the form of large refractory molecules (or small grains) referred to as carbon clusters. At the small end, uneven numbers of carbon atoms seem to be preferred, whereas above 12 atoms, clusters containing an even number of carbon atoms appear to be preferred in laboratory chemistry. In the lab, the cluster C-60 appears to be a particularly stable form and has been nicknamed Bucky Balls because of its resemblance to a soccer ball and to geodesic domes designed by Buckminster Fuller. In order to investigate the prevalence of these clusters, and their relationship to the polycyclic aromatic hydrocarbons (PAHs) that have become the newest focus of IR astronomy, it is necessary to determine the spectroscopic characteristics of these clusters at near and far infrared wavelengths. Described here is the construction of a near to far IR laser magnetic resonance spectrometer that has been built at the University of California Berkeley in order to detect and characterize these spectra. The equipment produces carbon clusters by laser evaporation of a graphitic target. The clusters are then cooled in a supersonic expansion beam in order to simulate conditions in the interstellar medium (ISM). The expansion beam feeds into the spectrometer chamber and permits concentrations of clusters sufficiently high as to permit ultra-high resolution spectroscopy at near and far IR wavelengths. The first successful demonstration of this apparatus occurred last year when the laboratory studies permitted the observational detection of C-5 in the stellar outflow surrounding IRC+10216 in the near-IR. Current efforts focus on reducing the temperature of the supersonic expansion beam that transport the C clusters evaporated from a graphite target into the spectrometer down to temperatures as low as 1 K.

  7. Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features

    PubMed Central

    2011-01-01

    Background Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Methods Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Results Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. Conclusion This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer. PMID:22044755

  8. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  9. Psi4NumPy: An Interactive Quantum Chemistry Programming Environment for Reference Implementations and Rapid Development.

    PubMed

    Smith, Daniel G A; Burns, Lori A; Sirianni, Dominic A; Nascimento, Daniel R; Kumar, Ashutosh; James, Andrew M; Schriber, Jeffrey B; Zhang, Tianyuan; Zhang, Boyi; Abbott, Adam S; Berquist, Eric J; Lechner, Marvin H; Cunha, Leonardo A; Heide, Alexander G; Waldrop, Jonathan M; Takeshita, Tyler Y; Alenaizan, Asem; Neuhauser, Daniel; King, Rollin A; Simmonett, Andrew C; Turney, Justin M; Schaefer, Henry F; Evangelista, Francesco A; DePrince, A Eugene; Crawford, T Daniel; Patkowski, Konrad; Sherrill, C David

    2018-06-11

    Psi4NumPy demonstrates the use of efficient computational kernels from the open-source Psi4 program through the popular NumPy library for linear algebra in Python to facilitate the rapid development of clear, understandable Python computer code for new quantum chemical methods, while maintaining a relatively low execution time. Using these tools, reference implementations have been created for a number of methods, including self-consistent field (SCF), SCF response, many-body perturbation theory, coupled-cluster theory, configuration interaction, and symmetry-adapted perturbation theory. Furthermore, several reference codes have been integrated into Jupyter notebooks, allowing background, underlying theory, and formula information to be associated with the implementation. Psi4NumPy tools and associated reference implementations can lower the barrier for future development of quantum chemistry methods. These implementations also demonstrate the power of the hybrid C++/Python programming approach employed by the Psi4 program.

  10. A Spectroscopic Analysis of the Galactic Globular Cluster NGC 6273 (M19)

    NASA Astrophysics Data System (ADS)

    Johnson, Christian I.; Rich, R. Michael; Pilachowski, Catherine A.; Caldwell, Nelson; Mateo, Mario; Bailey, John I., III; Crane, Jeffrey D.

    2015-08-01

    A combined effort utilizing spectroscopy and photometry has revealed the existence of a new globular cluster class. These “anomalous” clusters, which we refer to as “iron-complex” clusters, are differentiated from normal clusters by exhibiting large (≳0.10 dex) intrinsic metallicity dispersions, complex sub-giant branches, and correlated [Fe/H] and s-process enhancements. In order to further investigate this phenomenon, we have measured radial velocities and chemical abundances for red giant branch stars in the massive, but scarcely studied, globular cluster NGC 6273. The velocities and abundances were determined using high resolution (R ˜ 27,000) spectra obtained with the Michigan/Magellan Fiber System (M2FS) and MSpec spectrograph on the Magellan-Clay 6.5 m telescope at Las Campanas Observatory. We find that NGC 6273 has an average heliocentric radial velocity of +144.49 km s-1 (σ = 9.64 km s-1) and an extended metallicity distribution ([Fe/H] = -1.80 to -1.30) composed of at least two distinct stellar populations. Although the two dominant populations have similar [Na/Fe], [Al/Fe], and [α/Fe] abundance patterns, the more metal-rich stars exhibit significant [La/Fe] enhancements. The [La/Eu] data indicate that the increase in [La/Fe] is due to almost pure s-process enrichment. A third more metal-rich population with low [X/Fe] ratios may also be present. Therefore, NGC 6273 joins clusters such as ω Centauri, M2, M22, and NGC 5286 as a new class of iron-complex clusters exhibiting complicated star formation histories. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.

  11. Structural basis for a [4Fe-3S] cluster in the oxygen-tolerant membrane-bound [NiFe]-hydrogenase.

    PubMed

    Shomura, Yasuhito; Yoon, Ki-Seok; Nishihara, Hirofumi; Higuchi, Yoshiki

    2011-10-16

    Membrane-bound respiratory [NiFe]-hydrogenase (MBH), a H(2)-uptake enzyme found in the periplasmic space of bacteria, catalyses the oxidation of dihydrogen: H(2) → 2H(+) + 2e(-) (ref. 1). In contrast to the well-studied O(2)-sensitive [NiFe]-hydrogenases (referred to as the standard enzymes), MBH has an O(2)-tolerant H(2) oxidation activity; however, the mechanism of O(2) tolerance is unclear. Here we report the crystal structures of Hydrogenovibrio marinus MBH in three different redox conditions at resolutions between 1.18 and 1.32 Å. We find that the proximal iron-sulphur (Fe-S) cluster of MBH has a [4Fe-3S] structure coordinated by six cysteine residues--in contrast to the [4Fe-4S] cubane structure coordinated by four cysteine residues found in the proximal Fe-S cluster of the standard enzymes--and that an amide nitrogen of the polypeptide backbone is deprotonated and additionally coordinates the cluster when chemically oxidized, thus stabilizing the superoxidized state of the cluster. The structure of MBH is very similar to that of the O(2)-sensitive standard enzymes except for the proximal Fe-S cluster. Our results give a reasonable explanation why the O(2) tolerance of MBH is attributable to the unique proximal Fe-S cluster; we propose that the cluster is not only a component of the electron transfer for the catalytic cycle, but that it also donates two electrons and one proton crucial for the appropriate reduction of O(2) in preventing the formation of an unready, inactive state of the enzyme.

  12. Structural Study of Liquid Lithium Niobate by Neutron Diffraction Role of the Li Atom in the Clustering Near Solidification

    NASA Astrophysics Data System (ADS)

    Andonov, P.; Fischer, H. E.; Palleau, P.; Kimura, S.

    2001-05-01

    The structure of liquid LiNbO3 has been investigated by neutron diffraction using samples with different isotopic composition of lithium. The intensity scattered by these samples has been measured for momentum transfers 0.4 Å-1 T> 1500 K, which include the undercooling domain. From an analysis of the correlation functions Gij(r) of the atomic pairs Li-Li, Li-Nb, Li-O and their structural evolutions, given by Δ Gi-j (r) = Gi-j(r)1500 -Gi-j(r)1550 made with reference to the crystalline LiNbO3 ferroelectric structure, it was possible to confirm a local ordering similar to that of the crystal. The presence of clusters (groupings of NbO3 octahedra) is confirmed. Both regular and irregular N b06 octahedra are observed in the liquid near solidification. With its high mobil­ity in the melt, the Li atom plays an important role in the clustering: the Li-O and Li-Nb bonds make possible the staking of four octahedra groups into clusters of eight octahedra or more. The Li-Li bonds join these groups. The diameter of the clusters is a least 22 Å in the undercooling regime.

  13. Numerical taxonomy of Vibrio cholerae and related species isolated from areas that are endemic and nonendemic for cholera.

    PubMed Central

    McNicol, L A; De, S P; Kaper, J B; West, P A; Colwell, R R

    1983-01-01

    A total of 165 strains of vibrios isolated from clinical and environmental sources in the United States, India, and Bangladesh, 11 reference cultures, and 4 duplicated cultures were compared in a numerical taxonomic study using 83 unit characters. Similarity between strains was computed by using the simple matching coefficient and the Jaccard coefficient. Strains were clustered by unweighted average linkage and single linkage algorithms. All methods gave similar cluster compositions. The estimated probability of error in the study was obtained from a comparison of the results of duplicated strains and was within acceptable limits. A total of 174 of the 180 organisms studied were divided into eight major clusters. Two clusters were identified as Vibrio cholerae, one as Vibrio mimicus, one as Vibrio parahaemolyticus, three as Vibrio species, and one as Aeromonas hydrophila. The V. mimicus cluster could be further divided into two subclusters, and the major V. cholerae group could be split into seven minor subclusters. Phenotypic traits routinely used to identify clinical isolates of V. cholerae can be used to identify environmental V. cholerae isolates. No distinction was found between strains of V. cholerae isolated from regions endemic for cholera and strains from nonendemic regions. PMID:6874901

  14. Control of Chemical Effects in the Separation Process of a Differential Mobility / Mass Spectrometer System

    PubMed Central

    Schneider, Bradley B.; Coy, Stephen L.; Krylov, Evgeny V.; Nazarov, Erkinjon G.

    2013-01-01

    Differential mobility spectrometry (DMS) separates ions on the basis of the difference in their migration rates under high versus low electric fields. Several models describing the physical nature of this field mobility dependence have been proposed but emerging as a dominant effect is the clusterization model sometimes referred to as the dynamic cluster-decluster model. DMS resolution and peak capacity is strongly influenced by the addition of modifiers which results in the formation and dissociation of clusters. This process increases selectivity due to the unique chemical interactions that occur between an ion and neutral gas phase molecules. It is thus imperative to bring the parameters influencing the chemical interactions under control and find ways to exploit them in order to improve the analytical utility of the device. In this paper we describe three important areas that need consideration in order to stabilize and capitalize on the chemical processes that dominate a DMS separation. The first involves means of controlling the dynamic equilibrium of the clustering reactions with high concentrations of specific reagents. The second area involves a means to deal with the unwanted heterogeneous cluster ion populations emitted from the electrospray ionization process that degrade resolution and sensitivity. The third involves fine control of parameters that affect the fundamental collision processes, temperature and pressure. PMID:20065515

  15. Chapter 7. Cloning and analysis of natural product pathways.

    PubMed

    Gust, Bertolt

    2009-01-01

    The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.

  16. Insight from first principles into the stability and magnetism of alkali-metal superoxide nanoclusters

    NASA Astrophysics Data System (ADS)

    Arcelus, Oier; Suaud, Nicolas; Katcho, Nebil A.; Carrasco, Javier

    2017-05-01

    Alkali-metal superoxides are gaining increasing interest as 2p magnetic materials for information and energy storage. Despite significant research efforts on bulk materials, gaps in our knowledge of the electronic and magnetic properties at the nanoscale still remain. Here, we focused on the role that structural details play in determining stability, electronic structure, and magnetic couplings of (MO2)n (M = Li, Na, and K, with n = 2-8) clusters. Using first-principles density functional theory based on the Perdew-Burke-Ernzerhof and Heyd-Scuseria-Ernzerhof functionals, we examined the effect of atomic structure on the relative stability of different polymorphs within each investigated cluster size. We found that small clusters prefer to form planar-ring structures, whereas non-planar geometries become more stable when increasing the cluster size. However, the crossover point depends on the nature of the alkali metal. Our analysis revealed that electrostatic interactions govern the highly ionic M-O2 bonding and ultimately control the relative stability between 2-D and 3-D geometries. In addition, we analyzed the weak magnetic couplings between superoxide molecules in (NaO2)4 clusters comparing model Hamiltonian methods based on Wannier function projections onto πg states with wave function-based multi-reference calculations.

  17. Wing morphometrics as a possible tool for the diagnosis of the Ceratitis fasciventris, C. anonae, C. rosa complex (Diptera, Tephritidae).

    PubMed

    Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc

    2015-01-01

    Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.

  18. High-Resolution Infrared Spectroscopy of Imidazole Clusters in Helium Droplets Using Quantum Cascade Lasers

    NASA Astrophysics Data System (ADS)

    Mani, Devendra; Can, Cihad; Pal, Nitish; Schwaab, Gerhard; Havenith, Martina

    2017-06-01

    Imidazole ring is a part of many biologically important molecules and drugs. Imidazole monomer, dimer and its complexes with water have earlier been studied using infrared spectroscopy in helium droplets^{1,2} and molecular beams^{3}. These studies were focussed on the N-H and O-H stretch regions, covering the spectral region of 3200-3800 \\wn. We have extended the studies on imidazole clusters into the ring vibration region. The imidazole clusters were isolated in helium droplets and were probed using a combination of infrared spectroscopy and mass spectrometry. The spectra in the region of 1000-1100 \\wn and 1300-1460 \\wn were recorded using quantum cascade lasers. Some of the observed bands could be assigned to imidazole monomer and higher order imidazole clusters, using pickup curve analysis and ab initio calculations. Work is still in progress. The results will be discussed in detail in the talk. References: 1) M.Y. Choi and R.E. Miller, J. Phys. Chem. A, 110, 9344 (2006). 2) M.Y. Choi and R.E. Miller, Chem. Phys. Lett., 477, 276 (2009). 3) J. Zischang, J. J. Lee and M. Suhm, J. Chem. Phys., 135, 061102 (2011). Note: This work was supported by the Cluster of Excellence RESOLV (Ruhr-Universitat EXC1069) funded by the Deutsche Forschungsgemeinschaft.

  19. Real-time dynamics of RNA Polymerase II clustering in live human cells

    NASA Astrophysics Data System (ADS)

    Cisse, Ibrahim

    2014-03-01

    Transcription is the first step in the central dogma of molecular biology, when genetic information encoded on DNA is made into messenger RNA. How this fundamental process occurs within living cells (in vivo) is poorly understood,[1] despite extensive biochemical characterizations with isolated biomolecules (in vitro). For high-order organisms, like humans, transcription is reported to be spatially compartmentalized in nuclear foci consisting of clusters of RNA Polymerase II, the enzyme responsible for synthesizing all messenger RNAs. However, little is known of when these foci assemble or their relative stability. We developed an approach based on photo-activation localization microscopy (PALM) combined with a temporal correlation analysis, which we refer to as tcPALM. The tcPALM method enables the real-time characterization of biomolecular spatiotemporal organization, with single-molecule sensitivity, directly in living cells.[2] Using tcPALM, we observed that RNA Polymerase II clusters form transiently, with an average lifetime of 5.1 (+/- 0.4) seconds. Stimuli affecting transcription regulation yielded orders of magnitude changes in the dynamics of the polymerase clusters, implying that clustering is regulated and plays a role in the cells ability to effect rapid response to external signals. Our results suggest that the transient crowding of enzymes may aid in rate-limiting steps of genome regulation.

  20. Methane Production in Dairy Cows Correlates with Rumen Methanogenic and Bacterial Community Structure.

    PubMed

    Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan

    2017-01-01

    Methane (CH 4 ) is produced as an end product from feed fermentation in the rumen. Yield of CH 4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH 4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH 4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH 4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters ( P < 0.01) that were related to CH 4 production. One cluster was associated with low CH 4 production (referred to as cluster L) whereas the other cluster was associated with high CH 4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella . Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH 4 /kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH 4 /kg ECM, showing that low CH 4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations.

  1. Methane Production in Dairy Cows Correlates with Rumen Methanogenic and Bacterial Community Structure

    PubMed Central

    Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan

    2017-01-01

    Methane (CH4) is produced as an end product from feed fermentation in the rumen. Yield of CH4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters (P < 0.01) that were related to CH4 production. One cluster was associated with low CH4 production (referred to as cluster L) whereas the other cluster was associated with high CH4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella. Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH4/kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH4/kg ECM, showing that low CH4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations. PMID:28261182

  2. Next-generation genotype imputation service and methods.

    PubMed

    Das, Sayantan; Forer, Lukas; Schönherr, Sebastian; Sidore, Carlo; Locke, Adam E; Kwong, Alan; Vrieze, Scott I; Chew, Emily Y; Levy, Shawn; McGue, Matt; Schlessinger, David; Stambolian, Dwight; Loh, Po-Ru; Iacono, William G; Swaroop, Anand; Scott, Laura J; Cucca, Francesco; Kronenberg, Florian; Boehnke, Michael; Abecasis, Gonçalo R; Fuchsberger, Christian

    2016-10-01

    Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity.

  3. High-dimensional neural network potentials for solvation: The case of protonated water clusters in helium

    NASA Astrophysics Data System (ADS)

    Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik

    2018-03-01

    The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol-1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eriksen, Janus J., E-mail: janusje@chem.au.dk; Jørgensen, Poul; Matthews, Devin A.

    The accuracy at which total energies of open-shell atoms and organic radicals may be calculated is assessed for selected coupled cluster perturbative triples expansions, all of which augment the coupled cluster singles and doubles (CCSD) energy by a non-iterative correction for the effect of triple excitations. Namely, the second- through sixth-order models of the recently proposed CCSD(T–n) triples series [J. J. Eriksen et al., J. Chem. Phys. 140, 064108 (2014)] are compared to the acclaimed CCSD(T) model for both unrestricted as well as restricted open-shell Hartree-Fock (UHF/ROHF) reference determinants. By comparing UHF- and ROHF-based statistical results for a test setmore » of 18 modest-sized open-shell species with comparable RHF-based results, no behavioral differences are observed for the higher-order models of the CCSD(T–n) series in their correlated descriptions of closed- and open-shell species. In particular, we find that the convergence rate throughout the series towards the coupled cluster singles, doubles, and triples (CCSDT) solution is identical for the two cases. For the CCSD(T) model, on the other hand, not only its numerical consistency, but also its established, yet fortuitous cancellation of errors breaks down in the transition from closed- to open-shell systems. The higher-order CCSD(T–n) models (orders n > 3) thus offer a consistent and significant improvement in accuracy relative to CCSDT over the CCSD(T) model, equally for RHF, UHF, and ROHF reference determinants, albeit at an increased computational cost.« less

  5. High-dimensional neural network potentials for solvation: The case of protonated water clusters in helium.

    PubMed

    Schran, Christoph; Uhl, Felix; Behler, Jörg; Marx, Dominik

    2018-03-14

    The design of accurate helium-solute interaction potentials for the simulation of chemically complex molecules solvated in superfluid helium has long been a cumbersome task due to the rather weak but strongly anisotropic nature of the interactions. We show that this challenge can be met by using a combination of an effective pair potential for the He-He interactions and a flexible high-dimensional neural network potential (NNP) for describing the complex interaction between helium and the solute in a pairwise additive manner. This approach yields an excellent agreement with a mean absolute deviation as small as 0.04 kJ mol -1 for the interaction energy between helium and both hydronium and Zundel cations compared with coupled cluster reference calculations with an energetically converged basis set. The construction and improvement of the potential can be performed in a highly automated way, which opens the door for applications to a variety of reactive molecules to study the effect of solvation on the solute as well as the solute-induced structuring of the solvent. Furthermore, we show that this NNP approach yields very convincing agreement with the coupled cluster reference for properties like many-body spatial and radial distribution functions. This holds for the microsolvation of the protonated water monomer and dimer by a few helium atoms up to their solvation in bulk helium as obtained from path integral simulations at about 1 K.

  6. Predictors of primary care referrals to a vascular disease prevention lifestyle program among participants in a cluster randomised trial.

    PubMed

    Passey, Megan E; Laws, Rachel A; Jayasinghe, Upali W; Fanaian, Mahnaz; McKenzie, Suzanne; Powell-Davies, Gawaine; Lyle, David; Harris, Mark F

    2012-08-03

    Cardiovascular disease accounts for a large burden of disease, but is amenable to prevention through lifestyle modification. This paper examines patient and practice predictors of referral to a lifestyle modification program (LMP) offered as part of a cluster randomised controlled trial (RCT) of prevention of vascular disease in primary care. Data from the intervention arm of a cluster RCT which recruited 36 practices through two rural and three urban primary care organisations were used. In each practice, 160 eligible high risk patients were invited to participate. Practices were randomly allocated to intervention or control groups. Intervention practice staff were trained in screening, motivational interviewing and counselling and encouraged to refer high risk patients to a LMP involving individual and group sessions. Data include patient surveys; clinical audit; practice survey on capacity for preventive care; referral records from the LMP. Predictors of referral were examined using multi-level logistic regression modelling after adjustment for confounding factors. Of 301 eligible patients, 190 (63.1%) were referred to the LMP. Independent predictors of referral were baseline BMI ≥ 25 (OR 2.87 95%CI:1.10, 7.47), physical inactivity (OR 2.90 95%CI:1.36,6.14), contemplation/preparation/action stage of change for physical activity (OR 2.75 95%CI:1.07, 7.03), rural location (OR 12.50 95%CI:1.43, 109.7) and smaller practice size (1-3 GPs) (OR 16.05 95%CI:2.74, 94.24). Providing a well-structured evidence-based lifestyle intervention, free of charge to patients, with coordination and support for referral processes resulted in over 60% of participating high risk patients being referred for disease prevention. Contrary to expectations, referrals were more frequent from rural and smaller practices suggesting that these practices may be more ready to engage with these programs. ACTRN12607000423415.

  7. [Use of multiple locus variable number tandem repeats analysis for the Brucella systematization].

    PubMed

    Kulakov, Iu K; Kovalev, D A; Misetova, E N; Golovneva, S I; Liapustina, L V; Zheludkov, M M

    2012-01-01

    The methods of molecular-genetic differentiation to strain level acquire increasing significance in the current system of struggle with brucellosis. MLVA (multiple locus variable number tandem repeats analysis) was selected for molecular-genetic differentiation to strain level and simultaneous establishment of the genetic relationship of investigated Brucella strains. The goal of this work was MLVA typing of three pathogenic Brucella species strains with the analysis of stability of chosen loci, discrimination power and concordance to conventional phenotypic methods of the Brucella differentiation for use in systematization of brucellosis causing agents. Twenty six Brucella strains representing reference (n = 15), vaccine (n = 2) and field strains of three pathogenic Brucella species were tested: B. melitensis (n = 3), B. abortus (n = 2), B. suis (n = 2), and isolates (n = 2) with unidentified taxonomic position using MLVA with 9 pairs primers on known variable loci of Brucella genome. The analysis of the stability of chosen loci, discrimination power on Hunter-Gaston discrimination index (HGDI) and consistency to phenotypic methods of identification was performed. MLVA was confirmed for the results of phenotypic methods of identification, stability of the chosen loci in majority reference, and vaccine strains with a high index of variability HGDI 0.9969 for all loci. A dendrogram was plotted on the basis of MLVA data on distributed Brucella strains in related clusters according to its taxonomic species and biovar positions and construction of 25 genotypes. B. melitensis strains formed cluster related to the reference strain of B. melitensis 63/9 biovar 2. Australian isolates of Brucella 83-4 and Brucella 83-6 isolated from rodents formed a cluster distant from other strains of Brucella. MLVA is a promising method for differentiation of Brucella strains with known and unresolved taxonomic status for their systematization and creation of MLVA genotype catalogue that will promote qualitative improvement of brucellosis surveillance system in Russia.

  8. Methodology to assess and map the potential development of forest ecosystems exposed to climate change and atmospheric nitrogen deposition: A pilot study in Germany.

    PubMed

    Schröder, Winfried; Nickel, Stefan; Jenssen, Martin; Riediger, Jan

    2015-07-15

    A methodology for mapping ecosystems and their potential development under climate change and atmospheric nitrogen deposition was developed using examples from Germany. The methodology integrated data on vegetation, soil, climate change and atmospheric nitrogen deposition. These data were used to classify ecosystem types regarding six ecological functions and interrelated structures. Respective data covering 1961-1990 were used for reference. The assessment of functional and structural integrity relies on comparing a current or future state with an ecosystem type-specific reference. While current functions and structures of ecosystems were quantified by measurements, potential future developments were projected by geochemical soil modelling and data from a regional climate change model. The ecosystem types referenced the potential natural vegetation and were mapped using data on current tree species coverage and land use. In this manner, current ecosystem types were derived, which were related to data on elevation, soil texture, and climate for the years 1961-1990. These relations were quantified by Classification and Regression Trees, which were used to map the spatial patterns of ecosystem type clusters for 1961-1990. The climate data for these years were subsequently replaced by the results of a regional climate model for 1991-2010, 2011-2040, and 2041-2070. For each of these periods, one map of ecosystem type clusters was produced and evaluated with regard to the development of areal coverage of ecosystem type clusters over time. This evaluation of the structural aspects of ecological integrity at the national level was added by projecting potential future values of indicators for ecological functions at the site level by using the Very Simple Dynamic soil modelling technique based on climate data and two scenarios of nitrogen deposition as input. The results were compared to the reference and enabled an evaluation of site-specific ecosystem changes over time which proved to be both, positive and negative. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Oldest Known Objects May Be Surprisingly Immature

    NASA Astrophysics Data System (ADS)

    2008-04-01

    Some of the oldest objects in the Universe may still have a long way to go, according to a new study using NASA’s Chandra X-ray Observatory. These new results indicate that globular clusters might be surprisingly less mature in their development than previously thought. Globular clusters, dense bunches of up to millions of stars found in all galaxies, are among the oldest known objects in the Universe, with most estimates of their ages ranging from 9 to 13 billions of years old. As such they contain some of the first stars to form in a galaxy and understanding their evolution is critical to understanding the evolution of galaxies. Animation The Evolution of a Globular Cluster "For many years, globular clusters have been used as wonderful natural laboratories to study the evolution and interaction of stars," said John Fregeau of Northwestern University, who conducted the study. "So, it’s exciting to discover something that may be new and fundamental about the way they evolve." Conventional wisdom is that globular clusters pass through three phases of evolution or development of their structure, corresponding to adolescence, middle age, and old age. These "ages" refer to the evolutionary state of the cluster, not the physical ages of the individual stars. People Who Read This Also Read... Milky Way's Super-efficient Particle Accelerators Caught in The Act Discovery of Most Recent Supernova in Our Galaxy Action Replay of Powerful Stellar Explosion Jet Power and Black Hole Assortment Revealed in New Chandra Image In the adolescent phase, the stars near the center of the cluster collapse inward. Middle age refers to a phase when the interactions of double stars near the center of the cluster prevents it from further collapse. Finally, old age describes when binaries in the center of the cluster are disrupted or ejected, and the center of the cluster collapses inwards. For years, it has been thought that most globular clusters are middle- aged with a few being toward the end of their evolution. However, Chandra data along with theoretical work suggest this may not be the case. When single and double stars interact in the crowded centers of globular clusters, double stars can form that transfer mass and give off X-rays. Since such double stars are expected to mostly be formed in the middle of a globular cluster’s evolution and then lost in old age, the relative number of X-ray sources gives clues about the stage of evolution the cluster is in. A new study by Fregeau of 13 globular clusters in the Milky Way shows that three of them have unusually large number of X-ray sources, or X- ray binaries, suggesting the clusters are middle-aged. Previously, these globular clusters had been classified as being in old age because they had very tight concentrations of stars in their centers, another litmus test of age used by astronomers. The implication is that most globular clusters, including the other ten studied by Fregeau, are not in the middle age of their evolution, as previously thought, but are actually in adolescence. "It’s remarkable that these objects, which are thought to be some of the oldest in the Universe, may really be very immature in their development," said Fregeau whose paper appears in The Astrophysical Journal. "This would represent a major change in thinking about the current evolutionary status of globular clusters." If confirmed, this result would help reconcile other observations with recent theoretical work that suggest the tightness of the central concentration of stars in the most evolved globular clusters is consistent with them being in a middle, rather than an advanced phase of evolution. Other theoretical studies have suggested it can take longer than the current age of the Universe for globular clusters to reach old age. Besides improving the understanding of the basic evolution of globular clusters, this result has implications for understanding stellar interactions in dense environments. It also removes the need for exotic mechanisms - some involving black holes - that were thought to be needed to prevent the many middle-aged clusters from collapsing. "Some exotic scenarios, including some of my own, have been invoked to try to make sense of the observations and save the old theory," said Fregeau. "If this result holds up, we don't have to worry about the exotic scenarios any more." NASA's Marshall Space Flight Center, Huntsville, Ala., manages the Chandra program for the agency’s Science Mission Directorate. The Smithsonian Astrophysical Observatory controls science and flight operations from the Chandra X-ray Center in Cambridge, Mass.

  10. Spatiotemporal multistage consensus clustering in molecular dynamics studies of large proteins.

    PubMed

    Kenn, Michael; Ribarics, Reiner; Ilieva, Nevena; Cibena, Michael; Karch, Rudolf; Schreiner, Wolfgang

    2016-04-26

    The aim of this work is to find semi-rigid domains within large proteins as reference structures for fitting molecular dynamics trajectories. We propose an algorithm, multistage consensus clustering, MCC, based on minimum variation of distances between pairs of Cα-atoms as target function. The whole dataset (trajectory) is split into sub-segments. For a given sub-segment, spatial clustering is repeatedly started from different random seeds, and we adopt the specific spatial clustering with minimum target function: the process described so far is stage 1 of MCC. Then, in stage 2, the results of spatial clustering are consolidated, to arrive at domains stable over the whole dataset. We found that MCC is robust regarding the choice of parameters and yields relevant information on functional domains of the major histocompatibility complex (MHC) studied in this paper: the α-helices and β-floor of the protein (MHC) proved to be most flexible and did not contribute to clusters of significant size. Three alleles of the MHC, each in complex with ABCD3 peptide and LC13 T-cell receptor (TCR), yielded different patterns of motion. Those alleles causing immunological allo-reactions showed distinct correlations of motion between parts of the peptide, the binding cleft and the complementary determining regions (CDR)-loops of the TCR. Multistage consensus clustering reflected functional differences between MHC alleles and yields a methodological basis to increase sensitivity of functional analyses of bio-molecules. Due to the generality of approach, MCC is prone to lend itself as a potent tool also for the analysis of other kinds of big data.

  11. Structural parameters and blue stragglers in Sagittarius dwarf spheroidal galaxy globular clusters

    NASA Astrophysics Data System (ADS)

    Salinas, Ricardo; Jílková, Lucie; Carraro, Giovanni; Catelan, Márcio; Amigo, Pía.

    2012-04-01

    We present BV photometry of four Sagittarius dwarf spheroidal galaxy globular clusters: Arp 2, NGC 5634, Palomar 12 and Terzan 8, obtained with the Danish Telescope at ESO La Silla. We measure the structural parameters of the clusters using a King profile fitting, obtaining the first reliable measurements of the tidal radius of Arp 2 and Terzan 8. These two clusters are remarkably extended and with low concentrations; with a concentration of only c= 0.41 ± 0.02, Terzan 8 is less concentrated than any cluster in our Galaxy. Blue stragglers are identified in the four clusters, and their spatial distribution is compared to those of horizontal branch and red giant branch stars. The blue straggler properties do not provide evidence of mass segregation in Terzan 8, while Arp 2 probably shares the same status, although with less confidence. In the case of NGC 5634 and Palomar 12, blue stragglers are significantly less populous, and their analysis suggests that the two clusters have probably undergone mass segregation. References: (1) Peterson (1976); (2) Kron, Hewitt & Wasserman (1984); (3) Chernoff & Djorgovski (1989); (4) Trager, Djorgovski & King (1993); (5) Trager et al. (1995); (6) Rosenberg et al. (1998); (7) Mackey & Gilmore (2003b); (8) McLaughlin & van der Marel (2005) and (9) Carballo-Bello et al. (2012).

  12. Role of higher-multipole deformations in exotic {sup 14}C cluster radioactivity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sawhney, Gudveen; Sharma, Manoj K.; Gupta, Raj K.

    2011-06-15

    We have studied nine cases of spontaneous emission of {sup 14}C clusters in the ground-state decays of the same number of parent nuclei from the trans-lead region, specifically from {sup 221}Fr to {sup 226}Th, using the preformed cluster model (PCM) of Gupta and collaborators, with choices of spherical, quadrupole deformation ({beta}{sub 2}) alone, and higher-multipole deformations ({beta}{sub 2}, {beta}{sub 3}, {beta}{sub 4}) with cold ''compact'' orientations {theta}{sup c} of decay products. The calculated {sup 14}C cluster decay half-life times are found to be in nice agreement with experimental data only for the case of higher-multipole deformations ({beta}{sub 2}-{beta}{sub 4}) andmore » {theta}{sup c} orientations of cold elongated configurations. In other words, compared to our earlier study of clusters heavier than {sup 14}C, where the inclusion of {beta}{sub 2} alone, with ''optimum'' orientations, was found to be enough to give the best comparison with data, here for {sup 14}C cluster decay the inclusion of higher-multipole deformations (up to hexadecapole), together with {theta}{sup c} orientations, is found to be essential on the basis of the PCM. Interestingly, whereas both the penetration probability and assault frequency work simply as scaling factors, the preformation probability is strongly influenced by the order of multipole deformations and orientations of nuclei. The possible role of Q value and angular-momentum effects are also considered in reference to {sup 14}C cluster radioactivity.« less

  13. Tidally Induced Bars of Galaxies in Clusters

    NASA Astrophysics Data System (ADS)

    Łokas, Ewa L.; Ebrová, Ivana; del Pino, Andrés; Sybilska, Agnieszka; Athanassoula, E.; Semczuk, Marcin; Gajda, Grzegorz; Fouquet, Sylvain

    2016-08-01

    Using N-body simulations, we study the formation and evolution of tidally induced bars in disky galaxies in clusters. Our progenitor is a massive, late-type galaxy similar to the Milky Way, composed of an exponential disk and a Navarro-Frenk-White dark matter halo. We place the galaxy on four different orbits in a Virgo-like cluster and evolve it for 10 Gyr. As a reference case, we also evolve the same model in isolation. Tidally induced bars form on all orbits soon after the first pericenter passage and survive until the end of the evolution. They appear earlier, are stronger and longer, and have lower pattern speeds for tighter orbits. Only for the tightest orbit are the properties of the bar controlled by the orientation of the tidal torque from the cluster at pericenter. The mechanism behind the formation of the bars is the angular momentum transfer from the galaxy stellar component to its halo. All of the bars undergo extended periods of buckling instability that occur earlier and lead to more pronounced boxy/peanut shapes when the tidal forces are stronger. Using all simulation outputs of galaxies at different evolutionary stages, we construct a toy model of the galaxy population in the cluster and measure the average bar strength and bar fraction as a function of clustercentric radius. Both are found to be mildly decreasing functions of radius. We conclude that tidal forces can trigger bar formation in cluster cores, but not in the outskirts, and thus can cause larger concentrations of barred galaxies toward the cluster center.

  14. Know thy eHealth user: Development of biopsychosocial personas from a study of older adults with heart failure.

    PubMed

    Holden, Richard J; Kulanthaivel, Anand; Purkayastha, Saptarshi; Goggins, Kathryn M; Kripalani, Sunil

    2017-12-01

    Personas are a canonical user-centered design method increasingly used in health informatics research. Personas-empirically-derived user archetypes-can be used by eHealth designers to gain a robust understanding of their target end users such as patients. To develop biopsychosocial personas of older patients with heart failure using quantitative analysis of survey data. Data were collected using standardized surveys and medical record abstraction from 32 older adults with heart failure recently hospitalized for acute heart failure exacerbation. Hierarchical cluster analysis was performed on a final dataset of n=30. Nonparametric analyses were used to identify differences between clusters on 30 clustering variables and seven outcome variables. Six clusters were produced, ranging in size from two to eight patients per cluster. Clusters differed significantly on these biopsychosocial domains and subdomains: demographics (age, sex); medical status (comorbid diabetes); functional status (exhaustion, household work ability, hygiene care ability, physical ability); psychological status (depression, health literacy, numeracy); technology (Internet availability); healthcare system (visit by home healthcare, trust in providers); social context (informal caregiver support, cohabitation, marital status); and economic context (employment status). Tabular and narrative persona descriptions provide an easy reference guide for informatics designers. Personas development using approaches such as clustering of structured survey data is an important tool for health informatics professionals. We describe insights from our study of patients with heart failure, then recommend a generic ten-step personas development process. Methods strengths and limitations of the study and of personas development generally are discussed. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Clustering approaches to improve the performance of low cost air pollution sensors.

    PubMed

    Smith, Katie R; Edwards, Peter M; Evans, Mathew J; Lee, James D; Shaw, Marvin D; Squires, Freya; Wilde, Shona; Lewis, Alastair C

    2017-08-24

    Low cost air pollution sensors have substantial potential for atmospheric research and for the applied control of pollution in the urban environment, including more localized warnings to the public. The current generation of single-chemical gas sensors experience degrees of interference from other co-pollutants and have sensitivity to environmental factors such as temperature, wind speed and supply voltage. There are uncertainties introduced also because of sensor-to-sensor response variability, although this is less well reported. The sensitivity of Metal Oxide Sensors (MOS) to volatile organic compounds (VOCs) changed with relative humidity (RH) by up to a factor of five over the range of 19-90% RH and with an uncertainty in the correction of a factor of two at any given RH. The short-term (second to minute) stabilities of MOS and electrochemical CO sensor responses were reasonable. During more extended use, inter-sensor quantitative comparability was degraded due to unpredictable variability in individual sensor responses (to either measurand or interference or both) drifting over timescales of several hours to days. For timescales longer than a week identical sensors showed slow, often downwards, drifts in their responses which diverged across six CO sensors by up to 30% after two weeks. The measurement derived from the median sensor within clusters of 6, 8 and up to 21 sensors was evaluated against individual sensor performance and external reference values. The clustered approach maintained the cost competitiveness of a sensor device, but the median concentration from the ensemble of sensor signals largely eliminated the randomised hour-to-day response drift seen in individual sensors and excluded the effects of small numbers of poorly performing sensors that drifted significantly over longer time periods. The results demonstrate that for individual sensors to be optimally comparable to one another, and to reference instruments, they would likely require frequent calibration. The use of a cluster median value eliminates unpredictable medium term response changes, and other longer term outlier behaviours, extending the likely period needed between calibration and making a linear interpolation between calibrations more appropriate. Through the use of sensor clusters rather than individual sensors, existing low cost technologies could deliver significantly improved quality of observations.

  16. Methylobacterium Genome Sequences: A Reference Blueprint to Investigate Microbial Metabolism of C1 Compounds from Natural and Industrial Sources

    PubMed Central

    Lee, Ming-Chun; Bringel, Françoise; Lajus, Aurélie; Zhou, Yang; Gourion, Benjamin; Barbe, Valérie; Chang, Jean; Cruveiller, Stéphane; Dossat, Carole; Gillett, Will; Gruffaz, Christelle; Haugen, Eric; Hourcade, Edith; Levy, Ruth; Mangenot, Sophie; Muller, Emilie; Nadalig, Thierry; Pagni, Marco; Penny, Christian; Peyraud, Rémi; Robinson, David G.; Roche, David; Rouy, Zoé; Saenampechek, Channakhone; Salvignol, Grégory; Vallenet, David; Wu, Zaining; Marx, Christopher J.; Vorholt, Julia A.; Olson, Maynard V.; Kaul, Rajinder; Weissenbach, Jean; Médigue, Claudine; Lidstrom, Mary E.

    2009-01-01

    Background Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. Methodology/Principal Findings The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name “island integration determinant” (iid). Conclusion/Significance These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles. PMID:19440302

  17. Is It Time to Change Our Reference Curve for Femur Length? Using the Z-Score to Select the Best Chart in a Chinese Population

    PubMed Central

    Yang, Huixia; Wei, Yumei; Su, Rina; Wang, Chen; Meng, Wenying; Wang, Yongqing; Shang, Lixin; Cai, Zhenyu; Ji, Liping; Wang, Yunfeng; Sun, Ying; Liu, Jiaxiu; Wei, Li; Sun, Yufeng; Zhang, Xueying; Luo, Tianxia; Chen, Haixia; Yu, Lijun

    2016-01-01

    Objective To use Z-scores to compare different charts of femur length (FL) applied to our population with the aim of identifying the most appropriate chart. Methods A retrospective study was conducted in Beijing. Fifteen hospitals in Beijing were chosen as clusters using a systemic cluster sampling method, in which 15,194 pregnant women delivered from June 20th to November 30th, 2013. The measurements of FL in the second and third trimester were recorded, as well as the last measurement obtained before delivery. Based on the inclusion and exclusion criteria, we identified FL measurements from 19996 ultrasounds from 7194 patients between 11 and 42 weeks gestation. The FL data were then transformed into Z-scores that were calculated using three series of reference equations obtained from three reports: Leung TN, Pang MW et al (2008); Chitty LS, Altman DG et al (1994); and Papageorghiou AT et al (2014). Each Z-score distribution was presented as the mean and standard deviation (SD). Skewness and kurtosis and were compared with the standard normal distribution using the Kolmogorov-Smirnov test. The histogram of their distributions was superimposed on the non-skewed standard normal curve (mean = 0, SD = 1) to provide a direct visual impression. Finally, the sensitivity and specificity of each reference chart for identifying fetuses <5th or >95th percentile (based on the observed distribution of Z-scores) were calculated. The Youden index was also listed. A scatter diagram with the 5th, 50th, and 95th percentile curves calculated from and superimposed on each reference chart was presented to provide a visual impression. Results The three Z-score distribution curves appeared to be normal, but none of them matched the expected standard normal distribution. In our study, the Papageorghiou reference curve provided the best results, with a sensitivity of 100% for identifying fetuses with measurements < 5th and > 95th percentile, and specificities of 99.9% and 81.5%, respectively. Conclusions It is important to choose an appropriate reference curve when defining what is normal. The Papageorghiou reference curve for FL seems to be the best fit for our population. Perhaps it is time to change our reference curve for femur length. PMID:27458922

  18. Data Intensive Computing on Amazon Web Services

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Magana-Zook, S. A.

    The Geophysical Monitoring Program (GMP) has spent the past few years building up the capability to perform data intensive computing using what have been referred to as “big data” tools. These big data tools would be used against massive archives of seismic signals (>300 TB) to conduct research not previously possible. Examples of such tools include Hadoop (HDFS, MapReduce), HBase, Hive, Storm, Spark, Solr, and many more by the day. These tools are useful for performing data analytics on datasets that exceed the resources of traditional analytic approaches. To this end, a research big data cluster (“Cluster A”) was setmore » up as a collaboration between GMP and Livermore Computing (LC).« less

  19. Biomineralization of Schlumbergerella floresiana, a significant carbonate-producing benthic foraminifer

    NASA Astrophysics Data System (ADS)

    Sabbatini, Anna; Bédouet, Laurent; Marie, Arul; Bartolini, Annachiara; Landemarre, Ludovic; Weber, Michele; Ngurah Kade Mahardika, Gusti; Berland, Sophie; Zito, Francesca; Vénec-Peyré, Marie-Thérèse

    2016-04-01

    Most foraminifera that produce a shell are efficient biomineralizers. They contribute to the global carbon cycle, and thus influence ocean-climate regulation. Calcification in foraminifera is likely biologically controlled and is potentially similar to shell formation in metazoan taxa (e.g. mollusks, corals, sea urchins). However, foraminiferal biomineralization processes and the molecules involved are still poorly understood. We analyzed the calcitic shell of the large tropical benthic foraminifer Schlumbergerella floresiana. We found a suite of macromolecules containing many charged and polar amino acids and glycine that are also abundant in biomineralization proteins of other phyla. As neither genomic nor transcriptomic data are available for foraminiferal biomineralization yet, de novo-generated sequences, obtained from organic matrices submitted to MS BLAST database search, led to the characterization of 156 peptides. Very few homologous proteins were matched in the proteomic database, implying that the peptides are derived from unknown proteins present in the foraminiferal organic matrices. The amino acid distribution of these peptides was queried against the UNIPROT database and the mollusk UNIPROT database for comparison. The mollusks compose a well-studied phylum that yield a large variety of biomineralization proteins. These results showed that proteins extracted from S. floresiana shells contained sequences enriched with glycine, alanine, and proline, making a set of residues that provided a signature unique to foraminifera. Three of the de novo peptides exhibited sequence similarities to peptides found in proteins such as pre-collagen-P and a group of P-type ATPases including a calcium-transporting ATPase. Surprisingly, the peptide that was most similar to the collagen-like protein was a glycine-rich peptide reported from the test and spine proteome of sea urchin. The molecules, identified by matrix-assisted laser desorption ionization-time of flight mass spectrometry analyses, included acid-soluble N-glycoproteins with its sugar moieties represented by high-mannose-type glycans and carbohydrates. Describing the nature of the proteins, and associated molecules in the skeletal structure of living foraminifera, can elucidate the biomineralization mechanisms of these major carbonate producers in marine ecosystems. Foraminifera constitute an important tool used for paleo-environmental reconstructions because of their nearly continuous fossil record and abundance. Many studies focus on their biomineralization process using a geochemical perspective to record environmental and climate changes from shell isotopic and trace element compositions. Our results are a first step toward understanding the functioning mechanism behind biomineralization and the molecules involved. Coupling geochemical and biological perspectives will enhance interpretation of the proxies used for climatic reconstructions and improve future modeling efforts.

  20. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    NASA Astrophysics Data System (ADS)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  1. GLASS: The Grism Lens-Amplified Survey From Space. HST Grism Spectroscopy of the Frontier Fields

    NASA Astrophysics Data System (ADS)

    Schmidt, Kasper B.; Schmidt

    The Grism Lens-Amplified Survey From Space (GLASS) is a 140 orbit spectroscopic survey of 10 massive galaxy clusters, including the six Hubble Frontier Fields. GLASS has observed the cluster cores with the HST-WFC3 G102 and G141 grisms providing a wide wavelength coverage in the near-infrared from roughly 0.8-1.7μm. The parallel fields were observed through the optical ACS G800L grism. Taking advantage of the lensing magnification of the clusters, GLASS reaches intrinsic spectroscopic 1σ flux limits of roughly 10-18erg/s/cm2 and improved spatial resolution for lensed sources behind the clusters. These features are particularly useful for the three main science drivers of GLASS which are, I) exploring the universe at the epoch of reionization, II) describe how metals cycle in and out of galaxies, and III) asses the environmental dependence of galaxy evolution. The former two benefit highly from the improved depth and increased resolution provided by the cluster lensing. Apart from the main science drivers, a slew of ancillary science has been enabled by the survey, including improving cluster lens modeling and searches for supernovae. Here we present the survey and the GLASS data releases, which are continuously being made available to the community through https://archive.stsci.edu/prepds/glass/. For further information we refer to Schmidt et al. (2014), Treu et al. (2015), and http://glass.physics.ucsb.edu.

  2. Fatigue minimising power reference control of a de-rated wind farm

    NASA Astrophysics Data System (ADS)

    Jensen, T. N.; Knudsen, T.; Bak, T.

    2016-09-01

    Modern wind farms (cluster of wind turbines) can be required to control the total power output to meet a set-point, and would then profit by minimising the structural loads and thereby the cost of energy. In this paper, we propose a new control strategy for a derated wind farm with the objective of maintaining a desired reference power production for the wind farm, while minimising the sum of fatigues on the wind turbines in steady-state. The controller outputs a vector of power references for the individual turbines. It exploits the positive correlation between fatigue and added turbulence to minimise fatigue indirectly by minimising the added turbulence. Simulated results for a wind farm with three turbines demonstrate the efficacy of the proposed solution by assessing the damage equivalent loads.

  3. Family Functioning, Identity Formation, and the Ability of Conflict Resolution among Adolescents

    ERIC Educational Resources Information Center

    Kiani, Behnaz; Hojatkhah, Seyed Mohsen; Torabi-Nami, Mohammad

    2016-01-01

    Family is perhaps the most influential system in individuals' life in which various behaviors are learnt. Family functioning refers to the ability of family to meet its responsibilities. The present correlation study used a multi-stage cluster sampling method to recruit 686 subjects including 338 males and 348 females from all high school students…

  4. Spontaneous Buckling of Lipid Bilayer and Vesicle Budding Induced by Antimicrobial Peptide Magainin 2: A Coarse-Grained Simulation Study

    DTIC Science & Technology

    2011-05-30

    were added to neutralize each system. The GROMACS software package39 was used for simulations. The molecules in this paper refer to the CGMARTINI...accelerated.19 Most of the peptides on the surfaces ended up in clusters containing transmembrane pores, which appeared to perturb the bilayers significantly

  5. The Meaning of English Words across Cultures, with a Focus on Cameroon and Hong Kong

    ERIC Educational Resources Information Center

    Bobda, Augustin Simo

    2009-01-01

    A word, even when considered monosemic, generally has a cluster of meanings, depending on the mental representation of the referent by the speaker/writer or listener/reader. The variation is even more noticeable across cultures. This paper investigates the different ways in which cultural knowledge helps in the interpretation of English lexical…

  6. Nanoscience

    DTIC Science & Technology

    2011-07-22

    L., Upgrading of Existing X - Ray Photoelectron Spectrometer Capabilities for Development and Analysis of Novel Energetic NanoCluster materials (DURIP...References From the Technical Reports database Allara, David L., Pennsylvania State University, Upgrading of Existing X - Ray Photoelectron...Scanning probe  X - ray Of these techniques, the most popularly used is the scanning probe, also known as the Dip-Pen Nanolithography (DPN) technique

  7. Diagnosing Cervical Neoplasia in Rural Brazil Using a Mobile Van Equipped with In Vivo Microscopy: A Cluster-Randomized Community Trial.

    PubMed

    Hunt, Brady; Fregnani, José Humberto Tavares Guerreiro; Schwarz, Richard A; Pantano, Naitielle; Tesoni, Suelen; Possati-Resende, Júlio César; Antoniazzi, Marcio; de Oliveira Fonseca, Bruno; de Macêdo Matsushita, Graziela; Scapulatempo-Neto, Cristovam; Kerr, Ligia; Castle, Philip E; Schmeler, Kathleen; Richards-Kortum, Rebecca

    2018-06-01

    Cervical cancer is a leading cause of death in underserved areas of Brazil. This prospective randomized trial involved 200 women in southern/central Brazil with abnormal Papanicolaou tests. Participants were randomized by geographic cluster and referred for diagnostic evaluation either at a mobile van upon its scheduled visit to their local community, or at a central hospital. Participants in both arms underwent colposcopy, in vivo microscopy, and cervical biopsies. We compared rates of diagnostic follow-up completion between study arms, and also evaluated the diagnostic performance of in vivo microscopy compared with colposcopy. There was a 23% absolute and 37% relative increase in diagnostic follow-up completion rates for patients referred to the mobile van (102/117, 87%) compared with the central hospital (53/83, 64%; P = 0.0001; risk ratio = 1.37, 95% CI, 1.14-1.63). In 229 cervical sites in 144 patients, colposcopic examination identified sites diagnosed as cervical intraepithelial neoplasia grade 2 or more severe (CIN2+; 85 sites) with a sensitivity of 94% (95% CI, 87%-98%) and specificity of 50% (95% CI, 42%-58%). In vivo microscopy with real-time automated image analysis identified CIN2+ with a sensitivity of 92% (95% CI, 84%-97%) and specificity of 48% (95% CI, 40%-56%). Women referred to the mobile van were more likely to complete their diagnostic follow-up compared with those referred to a central hospital, without compromise in clinical care. In vivo microscopy in a mobile van provides automated diagnostic imaging with sensitivity and specificity similar to colposcopy. Cancer Prev Res; 11(6); 359-70. ©2018 AACR . ©2018 American Association for Cancer Research.

  8. Reporting non-adherence in cluster randomised trials: A systematic review.

    PubMed

    Agbla, Schadrac C; DiazOrdaz, Karla

    2018-06-01

    Treatment non-adherence in randomised trials refers to situations where some participants do not receive their allocated treatment as intended. For cluster randomised trials, where the unit of randomisation is a group of participants, non-adherence may occur at the cluster or individual level. When non-adherence occurs, randomisation no longer guarantees that the relationship between treatment receipt and outcome is unconfounded, and the power to detect the treatment effects in intention-to-treat analysis may be reduced. Thus, recording adherence and estimating the causal treatment effect adequately are of interest for clinical trials. To assess the extent of reporting of non-adherence issues in published cluster trials and to establish which methods are currently being used for addressing non-adherence, if any, and whether clustering is accounted for in these. We systematically reviewed 132 cluster trials published in English in 2011 previously identified through a search in PubMed. One-hundred and twenty three cluster trials were included in this systematic review. Non-adherence was reported in 56 cluster trials. Among these, 19 reported a treatment efficacy estimate: per protocol in 15 and as treated in 4. No study discussed the assumptions made by these methods, their plausibility or the sensitivity of the results to deviations from these assumptions. The year of publication of the cluster trials included in this review (2011) could be considered a limitation of this study; however, no new guidelines regarding the reporting and the handling of non-adherence for cluster trials have been published since. In addition, a single reviewer undertook the data extraction. To mitigate this, a second reviewer conducted a validation of the extraction process on 15 randomly selected reports. Agreement was satisfactory (93%). Despite the recommendations of the Consolidated Standards of Reporting Trials statement extension to cluster randomised trials, treatment adherence is under-reported. Among the trials providing adherence information, there was substantial variation in how adherence was defined, handled and reported. Researchers should discuss the assumptions required for the results to be interpreted causally and whether these are scientifically plausible in their studies. Sensitivity analyses to study the robustness of the results to departures from these assumptions should be performed.

  9. Can cluster environment modify the dynamical evolution of spiral galaxies?

    NASA Technical Reports Server (NTRS)

    Amram, P.; Balkowski, C.; Cayatte, V.; Marcelin, M.; Sullivan, W. T., III

    1993-01-01

    Over the past decade many effects of the cluster environment on member galaxies have been established. These effects are manifest in the amount and distribution of gas in cluster spirals, the luminosity and light distributions within galaxies, and the segregation of morphological types. All these effects could indicate a specific dynamical evolution for galaxies in clusters. Nevertheless, a more direct evidence, such as a different mass distribution for spiral galaxies in clusters and in the field, is not yet clearly established. Indeed, Rubin, Whitmore, and Ford (1988) and Whitmore, Forbes, and Rubin (1988) (referred to as RWF) presented evidence that inner cluster spirals have falling rotation curves, unlike those of outer cluster spirals or the great majority of field spirals. If falling rotation curves exist in centers of clusters, as argued by RWF, it would suggest that dark matter halos were absent from cluster spirals, either because the halos had become stripped by interactions with other galaxies or with an intracluster medium, or because the halos had never formed in the first place. Even if they didn't disagree with RWF, other researchers pointed out that the behaviour of the slope of the rotation curves of spiral galaxies (in Virgo) is not so clear. Amram, using a different sample of spiral galaxies in clusters, found only 10% of declining rotation curves (2 declining vs 17 flat or rising) in opposition to RWF who find about 40% of declining rotation curves in their sample (6 declining vs 10 flat or rising), we will hereafter briefly discuss the Amram data paper and compare it to the results of RWF. We have measured the rotation curves for a sample of 21 spiral galaxies in 5 nearby clusters. These rotation curves have been constructed from detailed two-dimensional maps of each galaxy's velocity field as traced by emission from the Ha line. This complete mapping, combined with the sensitivity of our CFHT 3.60 m. + Perot-Fabry + CCD observations, allows the construction of high-quality rotation curves. Details concerning the acquisition and reduction procedures of the data are given in Amram. We present and discuss our preliminary analysis and compare them with RWF's results.

  10. Benchmark of Dynamic Electron Correlation Models for Seniority-Zero Wave Functions and Their Application to Thermochemistry.

    PubMed

    Boguslawski, Katharina; Tecmer, Paweł

    2017-12-12

    Wave functions restricted to electron-pair states are promising models to describe static/nondynamic electron correlation effects encountered, for instance, in bond-dissociation processes and transition-metal and actinide chemistry. To reach spectroscopic accuracy, however, the missing dynamic electron correlation effects that cannot be described by electron-pair states need to be included a posteriori. In this Article, we extend the previously presented perturbation theory models with an Antisymmetric Product of 1-reference orbital Geminal (AP1roG) reference function that allows us to describe both static/nondynamic and dynamic electron correlation effects. Specifically, our perturbation theory models combine a diagonal and off-diagonal zero-order Hamiltonian, a single-reference and multireference dual state, and different excitation operators used to construct the projection manifold. We benchmark all proposed models as well as an a posteriori Linearized Coupled Cluster correction on top of AP1roG against CR-CC(2,3) reference data for reaction energies of several closed-shell molecules that are extrapolated to the basis set limit. Moreover, we test the performance of our new methods for multiple bond breaking processes in the homonuclear N 2 , C 2 , and F 2 dimers as well as the heteronuclear BN, CO, and CN + dimers against MRCI-SD, MRCI-SD+Q, and CR-CC(2,3) reference data. Our numerical results indicate that the best performance is obtained from a Linearized Coupled Cluster correction as well as second-order perturbation theory corrections employing a diagonal and off-diagonal zero-order Hamiltonian and a single-determinant dual state. These dynamic corrections on top of AP1roG provide substantial improvements for binding energies and spectroscopic properties obtained with the AP1roG approach, while allowing us to approach chemical accuracy for reaction energies involving closed-shell species.

  11. Is gender policy related to the gender gap in external cause and circulatory disease mortality? A mixed effects model of 22 OECD countries 1973–2008

    PubMed Central

    2012-01-01

    Background Gender differences in mortality vary widely between countries and over time, but few studies have examined predictors of these variations, apart from smoking. The aim of this study is to investigate the link between gender policy and the gender gap in cause-specific mortality, adjusted for economic factors and health behaviours. Methods 22 OECD countries were followed 1973–2008 and the outcomes were gender gaps in external cause and circulatory disease mortality. A previously found country cluster solution was used, which includes indicators on taxes, parental leave, pensions, social insurances and social services in kind. Male breadwinner countries were made reference group and compared to earner-carer, compensatory breadwinner, and universal citizen countries. Specific policies were also analysed. Mixed effect models were used, where years were the level 1-units, and countries were the level 2-units. Results Both the earner-carer cluster (ns after adjustment for GDP) and policies characteristic of that cluster are associated with smaller gender differences in external causes, particularly due to an association with increased female mortality. Cluster differences in the gender gap in circulatory disease mortality are the result of a larger relative decrease of male mortality in the compensatory breadwinner cluster and the earner-carer cluster. Policies characteristic of those clusters were however generally related to increased mortality. Conclusion Results for external cause mortality are in concordance with the hypothesis that women become more exposed to risks of accident and violence when they are economically more active. For circulatory disease mortality, results differ depending on approach – cluster or indicator. Whether cluster differences not explained by specific policies reflect other welfare policies or unrelated societal trends is an open question. Recommendations for further studies are made. PMID:23145477

  12. The Splashback Feature around DES Galaxy Clusters: Galaxy Density and Weak Lensing Profiles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Chihway; et al.

    Splashback refers to the process of matter that is accreting onto a dark matter halo reaching its first orbital apocenter and turning around in its orbit. The cluster-centric radius at which this process occurs, r_sp, defines a halo boundary that is connected to the dynamics of the cluster, in contrast with other common halo boundary definitions such as R_200. A rapid decline in the matter density profile of the halo is expected near r_sp. We measure the galaxy number density and weak lensing mass profiles around RedMapper galaxy clusters in the first year Dark Energy Survey (DES) data. For amore » cluster sample with mean mass ~2.5 x 10^14 solar masses, we find strong evidence of a splashback-like steepening of the galaxy density profile and measure r_sp=1.16 +/- 0.08 Mpc/h, consistent with earlier SDSS measurements of More et al. (2016) and Baxter et al. (2017). Moreover, our weak lensing measurement demonstrates for the first time the existence of a splashback-like steepening of the matter profile of galaxy clusters. We measure r_sp=1.28 +/- 0.18 Mpc/h from the weak lensing data, in good agreement with our galaxy density measurements. Applying our analysis to different cluster and galaxy samples, we find that consistent with LambdaCDM simulations, r_sp scales with R_200m and does not evolve with redshift over the redshift range of 0.3--0.6. We also find that potential systematic effects associated with the RedMapper algorithm may impact the location of r_sp, in particular the choice of scale used to estimate cluster richness. We discuss progress needed to understand the systematic uncertainties and fully exploit forthcoming data from DES and future surveys, emphasizing the importance of more realistic mock catalogs and independent cluster samples.« less

  13. BOOK REVIEW: The Gravitational Million-Body Problem: A Multidisciplinary Approach to Star Cluster Dynamics

    NASA Astrophysics Data System (ADS)

    Heggie, D.; Hut, P.

    2003-10-01

    The gravitational N-body problem is to describe the evolution of an isolated system of N point masses interacting only through Newtonian gravitational forces. For N =2 the solution is due to Newton. For N =3 there is no general analytic solution, but the problem has occupied generations of illustrious physicists and mathematicians including Laplace, Lagrange, Gauss and Poincaré, and inspired the modern subjects of nonlinear dynamics and chaos theory. The general gravitational N-body problem remains one of the oldest unsolved problems in physics. Many-body problems can be simpler than few-body problems, and many physicists have attempted to apply the methods of classical equilibrium statistical mechanics to the gravitational N-body problem for N gg 1. These applications have had only limited success, partly because the gravitational force is too strong at both small scales (the interparticle potential energy diverges) and large scales (energy is not extensive). Nevertheless, we now understand a rich variety of behaviour in large-N gravitating systems. These include the negative heat capacity of isolated, gravitationally bound systems, which is the basic reason why nuclear burning in the Sun is stable; Antonov's discovery that an isothermal, self-gravitating gas in a container is located at a saddle point, rather than a maximum, of the entropy when the gas is sufficiently dense and hence is unstable (the 'gravothermal catastrophe'); the process of core collapse, in which relaxation induces a self-similar evolution of the central core of the system towards (formally) infinite density in a finite time; and the remarkable phenomenon of gravothermal oscillations, in which the central density undergoes periodic oscillations by factors of a thousand or more on the relaxation timescale - but only if N gtrsim 104. The Gravitational Million-Body Problem is a monograph that describes our current understanding of the gravitational N-body problem. The authors have chosen to focus on N = 106 for two main reasons: first, direct numerical integrations of N-body systems are beginning to approach this threshold, and second, globular star clusters provide remarkably accurate physical instantiations of the idealized N-body problem with N = 105 - 106. The authors are distinguished contributors to the study of star-cluster dynamics and the gravitational N-body problem. The book contains lucid and concise descriptions of most of the important tools in the subject, with only a modest bias towards the authors' own interests. These tools include the two-body relaxation approximation, the Vlasov and Fokker-Planck equations, regularization of close encounters, conducting fluid models, Hill's approximation, Heggie's law for binary star evolution, symplectic integration algorithms, Liapunov exponents, and so on. The book also provides an up-to-date description of the principal processes that drive the evolution of idealized N-body systems - two-body relaxation, mass segregation, escape, core collapse and core bounce, binary star hardening, gravothermal oscillations - as well as additional processes such as stellar collisions and tidal shocks that affect real star clusters but not idealized N-body systems. In a relatively short (300 pages plus appendices) book such as this, many topics have to be omitted. The reader who is hoping to learn about the phenomenology of star clusters will be disappointed, as the description of their properties is limited to only a page of text; there is also almost no discussion of other, equally interesting N-body systems such as galaxies(N approx 106 - 1012), open clusters (N simeq 102 - 104), planetary systems, or the star clusters surrounding black holes that are found in the centres of most galaxies. All of these omissions are defensible decisions. Less defensible is the uneven set of references in the text; for example, nowhere is the reader informed that the classic predecessor to this work was Spitzer's 1987 monograph, Dynamical Evolution of Globular Clusters, or that the standard reference on the observational properties of stellar systems is Binney and Merrifield's Galactic Astronomy. A minor irritation is that many concepts are discussed several times before they are defined, and the index provides no pointer to the primary discussion; thus, for example, there are ten index entries for 'phase mixing' and no indication that the fourth of these refers to the actual definition. The book is intended as a graduate textbook but more likely it will be used mainly in other contexts: by theoretical researchers, as an indispensable reference on the dynamics of gravitational N-body systems; by observational astronomers, as a readable summary of the theory of star cluster evolution; and by physicists seeking a well-written and accessible introduction to a simple problem that remains fascinating and incompletely understood after three centuries. Scott Tremaine

  14. A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

    PubMed

    Rajan, Vaibhav

    2013-03-01

    Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.

  15. Acoustic Disturbances in Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Zweibel, Ellen G.; Mirnov, Vladimir V.; Ruszkowski, Mateusz; Reynolds, Christopher S.; Yang, H.-Y. Karen; Fabian, Andrew C.

    2018-05-01

    Galaxy cluster cores are pervaded by hot gas which radiates at far too high a rate to maintain any semblance of a steady state; this is referred to as the cooling flow problem. Of the many heating mechanisms that have been proposed to balance radiative cooling, one of the most attractive is the dissipation of acoustic waves generated by active galactic nuclei. Fabian et al. showed that if the waves are nearly adiabatic, wave damping due to heat conduction and viscosity must be well below standard Coulomb rates in order to allow the waves to propagate throughout the core. Because of the importance of this result, we have revisited wave dissipation under galaxy cluster conditions in a way that accounts for the self-limiting nature of dissipation by electron thermal conduction, allows the electron and ion temperature perturbations in the waves to evolve separately, and estimates kinetic effects by comparing to a semicollisionless theory. While these effects considerably enlarge the toolkit for analyzing observations of wavelike structures and developing a quantitative theory for wave heating, the drastic reduction of transport coefficients proposed in Fabian et al. remains the most viable path to acoustic wave heating of galaxy cluster cores.

  16. Bone Pose Estimation in the Presence of Soft Tissue Artifact Using Triangular Cosserat Point Elements.

    PubMed

    Solav, Dana; Rubin, M B; Cereatti, Andrea; Camomilla, Valentina; Wolf, Alon

    2016-04-01

    Accurate estimation of the position and orientation (pose) of a bone from a cluster of skin markers is limited mostly by the relative motion between the bone and the markers, which is known as the soft tissue artifact (STA). This work presents a method, based on continuum mechanics, to describe the kinematics of a cluster affected by STA. The cluster is characterized by triangular cosserat point elements (TCPEs) defined by all combinations of three markers. The effects of the STA on the TCPEs are quantified using three parameters describing the strain in each TCPE and the relative rotation and translation between TCPEs. The method was evaluated using previously collected ex vivo kinematic data. Femur pose was estimated from 12 skin markers on the thigh, while its reference pose was measured using bone pins. Analysis revealed that instantaneous subsets of TCPEs exist which estimate bone position and orientation more accurately than the Procrustes Superimposition applied to the cluster of all markers. It has been shown that some of these parameters correlate well with femur pose errors, which suggests that they can be used to select, at each instant, subsets of TCPEs leading an improved estimation of the underlying bone pose.

  17. Genetic differentiation of chinese indigenous meat goats ascertained using microsatellite information.

    PubMed

    Ling, Y H; Zhang, X D; Yao, N; Ding, J P; Chen, H Q; Zhang, Z J; Zhang, Y H; Ren, C H; Ma, Y H; Zhang, X R

    2012-02-01

    To investigate the genetic diversity of seven Chinese indigenous meat goat breeds (Tibet goat, Guizhou white goat, Shannan white goat, Yichang white goat, Matou goat, Changjiangsanjiaozhou white goat and Anhui white goat), explain their genetic relationship and assess their integrity and degree of admixture, 302 individuals from these breeds and 42 Boer goats introduced from Africa as reference samples were genotyped for 11 microsatellite markers. Results indicated that the genetic diversity of Chinese indigenous meat goats was rich. The mean heterozygosity and the mean allelic richness (AR) for the 8 goat breeds varied from 0.697 to 0.738 and 6.21 to 7.35, respectively. Structure analysis showed that Tibet goat breed was genetically distinct and was the first to separate and the other Chinese goats were then divided into two sub-clusters: Shannan white goat and Yichang white goat in one cluster; and Guizhou white goat, Matou goat, Changjiangsanjiaozhou white goat and Anhui white goat in the other cluster. This grouping pattern was further supported by clustering analysis and Principal component analysis. These results may provide a scientific basis for the characteristization, conservation and utilization of Chinese meat goats.

  18. Density-functional theory study of ionic inhomogeneity in metal clusters using SC-ISJM

    NASA Astrophysics Data System (ADS)

    Payami, Mahmoud; Mahmoodi, Tahereh

    2017-12-01

    In this work we have applied the recently formulated self-compressed inhomogeneous stabilized jellium model [51] to describe the equilibrium electronic and geometric properties of atomic-closed-shell simple metal clusters of AlN (N = 13, 19, 43, 55, 79, 87, 135, 141), NaN, and CsN (N = 9, 15, 27, 51, 59, 65, 89, 113). To validate the results, we have also performed first-principles pseudo-potential calculations and used them as our reference. In the model, we have considered two regions consisting of ;surface; and ;inner; ones, the border separating them being sharp. This generalization makes possible to decouple the relaxations of different parts of the system. The results show that the present model correctly predicts the size reductions seen in most of the clusters. It also predicts increase in size of some clusters, as observed from first-principles results. Moreover, the changes in inter-layer distances, being as contractions or expansions, are in good agreement with the atomic simulation results. For a more realistic description of the properties, it is possible to improve the method of choosing the surface thicknesses or generalize the model to include more regions than just two.

  19. Beyond Aztec Castles: Toric Cascades in the dP 3 Quiver

    NASA Astrophysics Data System (ADS)

    Lai, Tri; Musiker, Gregg

    2017-12-01

    Given one of an infinite class of supersymmetric quiver gauge theories, string theorists can associate a corresponding toric variety (which is a Calabi-Yau 3-fold) as well as an associated combinatorial model known as a brane tiling. In combinatorial language, a brane tiling is a bipartite graph on a torus and its perfect matchings are of interest to both combinatorialists and physicists alike. A cluster algebra may also be associated to such quivers and in this paper we study the generators of this algebra, known as cluster variables, for the quiver associated to the cone over the del Pezzo surface d P 3. In particular, mutation sequences involving mutations exclusively at vertices with two in-coming arrows and two out-going arrows are referred to as toric cascades in the string theory literature. Such toric cascades give rise to interesting discrete integrable systems on the level of cluster variable dynamics. We provide an explicit algebraic formula for all cluster variables that are reachable by toric cascades as well as a combinatorial interpretation involving perfect matchings of subgraphs of the d P 3 brane tiling for these formulas in most cases.

  20. Visualization of the IMIA Yearbook of Medical Informatics Publications over the Last 25 Years

    PubMed Central

    Tam-Tham, H.; Minty, E. P.

    2016-01-01

    Summary Background The last 25 years have been a period of innovation in the area of medical informatics. The International Medical Informatics Association (IMIA) has published, every year for the last quarter century, the Yearbook of Medical Informatics, collating selected papers from various journals in an attempt to provide a summary of the academic medical informatics literature. The objective of this paper is to visualize the evolution of the medical informatics field over the last 25 years according to the frequency of word occurrences in the papers published in the IMIA Yearbook of Medical Informatics. Methods A literature review was conducted examining the IMIA Yearbook of Medical Informatics between 1992 and 2015. These references were collated into a reference manager application to examine the literature using keyword searches, word clouds, and topic clustering. The data was considered in its entirety, as well as segregated into 3 time periods to examine the evolution of main trends over time. Several methods were used, including word clouds, cluster maps, and custom developed web-based information dashboards. Results The literature search resulted in a total of 1210 references published in the Yearbook, of which 213 references were excluded, resulting in 997 references for visualization. Overall, we found that publications were more technical and methods-oriented between 1992 and 1999; more clinically and patient-oriented between 2000 and 2009; and noted the emergence of “big data”, decision support, and global health in the past decade between 2010 and 2015. Dashboards were additionally created to show individual reference data, as well as, aggregated information. Conclusion Medical informatics is a vast and expanding area with new methods and technologies being researched, implemented, and evaluated. Determining visualization approaches that enhance our understanding of literature is an active area of research, and like medical informatics, is constantly evolving as new software and algorithms are developed. This paper examined several approaches for visualizing the medical informatics literature to show historical trends, associations, and aggregated summarized information to illustrate the state and changes in the IMIA Yearbook publications over the last quarter century. PMID:27362591

Top