Sample records for document clustering based

  1. Study of parameters of the nearest neighbour shared algorithm on clustering documents

    NASA Astrophysics Data System (ADS)

    Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

    2018-03-01

    Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.

  2. Agent-based method for distributed clustering of textual information

    DOEpatents

    Potok, Thomas E [Oak Ridge, TN; Reed, Joel W [Knoxville, TN; Elmore, Mark T [Oak Ridge, TN; Treadwell, Jim N [Louisville, TN

    2010-09-28

    A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.

  3. A knowledge-driven approach to biomedical document conceptualization.

    PubMed

    Zheng, Hai-Tao; Borchert, Charles; Jiang, Yong

    2010-06-01

    Biomedical document conceptualization is the process of clustering biomedical documents based on ontology-represented domain knowledge. The result of this process is the representation of the biomedical documents by a set of key concepts and their relationships. Most of clustering methods cluster documents based on invariant domain knowledge. The objective of this work is to develop an effective method to cluster biomedical documents based on various user-specified ontologies, so that users can exploit the concept structures of documents more effectively. We develop a flexible framework to allow users to specify the knowledge bases, in the form of ontologies. Based on the user-specified ontologies, we develop a key concept induction algorithm, which uses latent semantic analysis to identify key concepts and cluster documents. A corpus-related ontology generation algorithm is developed to generate the concept structures of documents. Based on two biomedical datasets, we evaluate the proposed method and five other clustering algorithms. The clustering results of the proposed method outperform the five other algorithms, in terms of key concept identification. With respect to the first biomedical dataset, our method has the F-measure values 0.7294 and 0.5294 based on the MeSH ontology and gene ontology (GO), respectively. With respect to the second biomedical dataset, our method has the F-measure values 0.6751 and 0.6746 based on the MeSH ontology and GO, respectively. Both results outperforms the five other algorithms in terms of F-measure. Based on the MeSH ontology and GO, the generated corpus-related ontologies show informative conceptual structures. The proposed method enables users to specify the domain knowledge to exploit the conceptual structures of biomedical document collections. In addition, the proposed method is able to extract the key concepts and cluster the documents with a relatively high precision. Copyright 2010 Elsevier B.V. All rights reserved.

  4. Utilizing the Structure and Content Information for XML Document Clustering

    NASA Astrophysics Data System (ADS)

    Tran, Tien; Kutty, Sangeetha; Nayak, Richi

    This paper reports on the experiments and results of a clustering approach used in the INEX 2008 document mining challenge. The clustering approach utilizes both the structure and content information of the Wikipedia XML document collection. A latent semantic kernel (LSK) is used to measure the semantic similarity between XML documents based on their content features. The construction of a latent semantic kernel involves the computing of singular vector decomposition (SVD). On a large feature space matrix, the computation of SVD is very expensive in terms of time and memory requirements. Thus in this clustering approach, the dimension of the document space of a term-document matrix is reduced before performing SVD. The document space reduction is based on the common structural information of the Wikipedia XML document collection. The proposed clustering approach has shown to be effective on the Wikipedia collection in the INEX 2008 document mining challenge.

  5. Ontology-based structured cosine similarity in document summarization: with applications to mobile audio-based knowledge management.

    PubMed

    Yuan, Soe-Tsyr; Sun, Jerry

    2005-10-01

    Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.

  6. Clustering XML Documents Using Frequent Subtrees

    NASA Astrophysics Data System (ADS)

    Kutty, Sangeetha; Tran, Tien; Nayak, Richi; Li, Yuefeng

    This paper presents an experimental study conducted over the INEX 2008 Document Mining Challenge corpus using both the structure and the content of XML documents for clustering them. The concise common substructures known as the closed frequent subtrees are generated using the structural information of the XML documents. The closed frequent subtrees are then used to extract the constrained content from the documents. A matrix containing the term distribution of the documents in the dataset is developed using the extracted constrained content. The k-way clustering algorithm is applied to the matrix to obtain the required clusters. In spite of the large number of documents in the INEX 2008 Wikipedia dataset, the proposed frequent subtree-based clustering approach was successful in clustering the documents. This approach significantly reduces the dimensionality of the terms used for clustering without much loss in accuracy.

  7. Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    PubMed Central

    Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy

    2011-01-01

    Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291

  8. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.

    PubMed

    Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol

    2007-11-27

    A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.

  9. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method

    PubMed Central

    Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol

    2007-01-01

    Background A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Results Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Conclusion Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries. PMID:18047705

  10. Semantic Clustering of Search Engine Results

    PubMed Central

    Soliman, Sara Saad; El-Sayed, Maged F.; Hassan, Yasser F.

    2015-01-01

    This paper presents a novel approach for search engine results clustering that relies on the semantics of the retrieved documents rather than the terms in those documents. The proposed approach takes into consideration both lexical and semantics similarities among documents and applies activation spreading technique in order to generate semantically meaningful clusters. This approach allows documents that are semantically similar to be clustered together rather than clustering documents based on similar terms. A prototype is implemented and several experiments are conducted to test the prospered solution. The result of the experiment confirmed that the proposed solution achieves remarkable results in terms of precision. PMID:26933673

  11. Thematic clustering of text documents using an EM-based approach

    PubMed Central

    2012-01-01

    Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528

  12. BioTextQuest: a web-based biomedical text mining suite for concept discovery.

    PubMed

    Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J

    2011-12-01

    BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.

  13. Subject and Citation Indexing. Part I: The Clustering Structure of Composite Representations in the Cystic Fibrosis Document Collection. Part II: The Optimal, Cluster-Based Retrieval Performance of Composite Representations.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1991-01-01

    Two articles discuss the clustering of composite representations in the Cystic Fibrosis Document Collection from the National Library of Medicine's MEDLINE file. Clustering is evaluated as a function of the exhaustivity of composite representations based on Medical Subject Headings (MeSH) and citation indexes, and evaluation of retrieval…

  14. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    PubMed

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  15. Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension

    PubMed Central

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172

  16. PuReD-MCL: a graph-based PubMed document clustering methodology.

    PubMed

    Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A

    2008-09-01

    Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/

  17. Text Summarization Model based on Facility Location Problem

    NASA Astrophysics Data System (ADS)

    Takamura, Hiroya; Okumura, Manabu

    e propose a novel multi-document generic summarization model based on the budgeted median problem, which is a facility location problem. The summarization method based on our model is an extractive method, which selects sentences from the given document cluster and generates a summary. Each sentence in the document cluster will be assigned to one of the selected sentences, where the former sentece is supposed to be represented by the latter. Our method selects sentences to generate a summary that yields a good sentence assignment and hence covers the whole content of the document cluster. An advantage of this method is that it can incorporate asymmetric relations between sentences such as textual entailment. Through experiments, we showed that the proposed method yields good summaries on the dataset of DUC'04.

  18. Script identification from images using cluster-based templates

    DOEpatents

    Hochberg, J.G.; Kelly, P.M.; Thomas, T.R.

    1998-12-01

    A computer-implemented method identifies a script used to create a document. A set of training documents for each script to be identified is scanned into the computer to store a series of exemplary images representing each script. Pixels forming the exemplary images are electronically processed to define a set of textual symbols corresponding to the exemplary images. Each textual symbol is assigned to a cluster of textual symbols that most closely represents the textual symbol. The cluster of textual symbols is processed to form a representative electronic template for each cluster. A document having a script to be identified is scanned into the computer to form one or more document images representing the script to be identified. Pixels forming the document images are electronically processed to define a set of document textual symbols corresponding to the document images. The set of document textual symbols is compared to the electronic templates to identify the script. 17 figs.

  19. Script identification from images using cluster-based templates

    DOEpatents

    Hochberg, Judith G.; Kelly, Patrick M.; Thomas, Timothy R.

    1998-01-01

    A computer-implemented method identifies a script used to create a document. A set of training documents for each script to be identified is scanned into the computer to store a series of exemplary images representing each script. Pixels forming the exemplary images are electronically processed to define a set of textual symbols corresponding to the exemplary images. Each textual symbol is assigned to a cluster of textual symbols that most closely represents the textual symbol. The cluster of textual symbols is processed to form a representative electronic template for each cluster. A document having a script to be identified is scanned into the computer to form one or more document images representing the script to be identified. Pixels forming the document images are electronically processed to define a set of document textual symbols corresponding to the document images. The set of document textual symbols is compared to the electronic templates to identify the script.

  20. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    DOEpatents

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  1. Model-based document categorization employing semantic pattern analysis and local structure clustering

    NASA Astrophysics Data System (ADS)

    Fume, Kosei; Ishitani, Yasuto

    2008-01-01

    We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.

  2. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.

    PubMed

    Zhang, Wen; Xiao, Fan; Li, Bin; Zhang, Siguang

    2016-01-01

    Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

  3. Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

    PubMed Central

    Xiao, Fan; Li, Bin; Zhang, Siguang

    2016-01-01

    Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods. PMID:27579031

  4. Handwritten text line segmentation by spectral clustering

    NASA Astrophysics Data System (ADS)

    Han, Xuecheng; Yao, Hui; Zhong, Guoqiang

    2017-02-01

    Since handwritten text lines are generally skewed and not obviously separated, text line segmentation of handwritten document images is still a challenging problem. In this paper, we propose a novel text line segmentation algorithm based on the spectral clustering. Given a handwritten document image, we convert it to a binary image first, and then compute the adjacent matrix of the pixel points. We apply spectral clustering on this similarity metric and use the orthogonal kmeans clustering algorithm to group the text lines. Experiments on Chinese handwritten documents database (HIT-MW) demonstrate the effectiveness of the proposed method.

  5. Information Clustering Based on Fuzzy Multisets.

    ERIC Educational Resources Information Center

    Miyamoto, Sadaaki

    2003-01-01

    Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…

  6. Autocorrelation and Regularization of Query-Based Information Retrieval Scores

    DTIC Science & Technology

    2008-02-01

    of the most general information retrieval models [ Salton , 1968]. By treating a query as a very short document, documents and queries can be rep... Salton , 1971]. In the context of single link hierarchical clustering, Jardine and van Rijsbergen showed that ranking all k clusters and retrieving a...a document about “dogs”, then the system will always miss this document when a user queries “dog”. Salton recognized that a document’s representation

  7. Automatic script identification from images using cluster-based templates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hochberg, J.; Kerns, L.; Kelly, P.

    We have developed a technique for automatically identifying the script used to generate a document that is stored electronically in bit image form. Our approach differs from previous work in that the distinctions among scripts are discovered by an automatic learning procedure, without any handson analysis. We first develop a set of representative symbols (templates) for each script in our database (Cyrillic, Roman, etc.). We do this by identifying all textual symbols in a set of training documents, scaling each symbol to a fixed size, clustering similar symbols, pruning minor clusters, and finding each cluster`s centroid. To identify a newmore » document`s script, we identify and scale a subset of symbols from the document and compare them to the templates for each script. We choose the script whose templates provide the best match. Our current system distinguishes among the Armenian, Burmese, Chinese, Cyrillic, Ethiopic, Greek, Hebrew, Japanese, Korean, Roman, and Thai scripts with over 90% accuracy.« less

  8. A Linear Algebra Measure of Cluster Quality.

    ERIC Educational Resources Information Center

    Mather, Laura A.

    2000-01-01

    Discussion of models for information retrieval focuses on an application of linear algebra to text clustering, namely, a metric for measuring cluster quality based on the theory that cluster quality is proportional to the number of terms that are disjoint across the clusters. Explains term-document matrices and clustering algorithms. (Author/LRW)

  9. Architecture and Channel-Belt Clustering in the Fluvial lower Wasatch Formation, Uinta Basin, Utah

    NASA Astrophysics Data System (ADS)

    Pisel, J. R.; Pyles, D. R.; Bracken, B.; Rosenbaum, C. D.

    2013-12-01

    The Eocene lower Wasatch Formation of the Uinta Basin contains exceptional outcrops of low net-sand content (27% sand) fluvial strata. This study quantitatively documents the stratigraphy of a 7 km wide by 300 meter thick strike-oriented outcrop in order to develop a quantitative data base that can be used to improve our knowledge of how some fluvial systems evolve over geologic time scales. Data used to document the outcrop are: (1) 550 meters of decimeter to half meter scale resolution stratigraphic columns that document grain size and physical sedimentary structures; (2) detailed photopanels used to document architectural style and lithofacies types in the outcrop; (3) thickness, width, and spatial position for all channel belts in the outcrop, and (4) directional measurements of paleocurrent indicators. Two channel-belt styles are recognized: lateral and downstream accreting channel belts; both of which occur as either single or multi-story. Floodplain strata are well exposed and consist of overbank fines and sand-rich crevasse splay deposits. Key upward and lateral characteristics of the outcrop documented herein are the following. First, the shapes of 243 channels are documented. The average width, thickness and aspect ratios of the channel belts are 110 m, 7 m, and 16:1, respectively. Importantly, the size and shape of channel belts does not change upward through the 300 meter transect. Second, channels are documented to spatially cluster. 9 clusters are documented using a spatial statistic. Key upward patterns in channel belt clustering are a marked change from non-amalgamated isolated channel-belt clusters to amalgamated channel-belt clusters. Critically, stratal surfaces can be correlated from mudstone units within the clusters to time-equivalent floodplain strata adjacent to the cluster demonstrating that clusters are not confined within fluvial valleys. Finally, proportions of floodplain and channel belt elements underlying clusters and channel belts vary with the style of clusters and channel belts laterally and vertically within the outcrop.

  10. Clustering document fragments using background color and texture information

    NASA Astrophysics Data System (ADS)

    Chanda, Sukalpa; Franke, Katrin; Pal, Umapada

    2012-01-01

    Forensic analysis of questioned documents sometimes can be extensively data intensive. A forensic expert might need to analyze a heap of document fragments and in such cases to ensure reliability he/she should focus only on relevant evidences hidden in those document fragments. Relevant document retrieval needs finding of similar document fragments. One notion of obtaining such similar documents could be by using document fragment's physical characteristics like color, texture, etc. In this article we propose an automatic scheme to retrieve similar document fragments based on visual appearance of document paper and texture. Multispectral color characteristics using biologically inspired color differentiation techniques are implemented here. This is done by projecting document color characteristics to Lab color space. Gabor filter-based texture analysis is used to identify document texture. It is desired that document fragments from same source will have similar color and texture. For clustering similar document fragments of our test dataset we use a Self Organizing Map (SOM) of dimension 5×5, where the document color and texture information are used as features. We obtained an encouraging accuracy of 97.17% from 1063 test images.

  11. Space station ECLSS integration analysis: Simplified General Cluster Systems Model, ECLS System Assessment Program enhancements

    NASA Technical Reports Server (NTRS)

    Ferguson, R. E.

    1985-01-01

    The data base verification of the ECLS Systems Assessment Program (ESAP) was documented and changes made to enhance the flexibility of the water recovery subsystem simulations are given. All changes which were made to the data base values are described and the software enhancements performed. The refined model documented herein constitutes the submittal of the General Cluster Systems Model. A source listing of the current version of ESAP is provided in Appendix A.

  12. Significance of clustering and classification applications in digital and physical libraries

    NASA Astrophysics Data System (ADS)

    Triantafyllou, Ioannis; Koulouris, Alexandros; Zervos, Spiros; Dendrinos, Markos; Giannakopoulos, Georgios

    2015-02-01

    Applications of clustering and classification techniques can be proved very significant in both digital and physical (paper-based) libraries. The most essential application, document classification and clustering, is crucial for the content that is produced and maintained in digital libraries, repositories, databases, social media, blogs etc., based on various tags and ontology elements, transcending the traditional library-oriented classification schemes. Other applications with very useful and beneficial role in the new digital library environment involve document routing, summarization and query expansion. Paper-based libraries can benefit as well since classification combined with advanced material characterization techniques such as FTIR (Fourier Transform InfraRed spectroscopy) can be vital for the study and prevention of material deterioration. An improved two-level self-organizing clustering architecture is proposed in order to enhance the discrimination capacity of the learning space, prior to classification, yielding promising results when applied to the above mentioned library tasks.

  13. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  14. Automatic document classification of biological literature

    PubMed Central

    Chen, David; Müller, Hans-Michael; Sternberg, Paul W

    2006-01-01

    Background Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusion We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. PMID:16893465

  15. V-TECS Career Cluster Frameworks.

    ERIC Educational Resources Information Center

    Vocational Technical Education Consortium of States, Decatur, GA.

    This document includes 16 vocational-technical crosswalk wheels relating the 14 Vocational Technical Education Consortium of States (V-TECS) Career Families to the 16 Career Clusters developed by the U.S. Department of Education. The career clusters are based on the common academic, workplace, and technical knowledge and skills that cut across all…

  16. Technical structure of the global nanoscience and nanotechnology literature

    NASA Astrophysics Data System (ADS)

    Kostoff, Ronald N.; Koytcheff, Raymond G.; Lau, Clifford G. Y.

    2007-10-01

    Text mining was used to extract technical intelligence from the open source global nanotechnology and nanoscience research literature. An extensive nanotechnology/nanoscience-focused query was applied to the Science Citation Index/Social Science Citation Index (SCI/SSCI) databases. The nanotechnology/nanoscience research literature technical structure (taxonomy) was obtained using computational linguistics/document clustering and factor analysis. The infrastructure (prolific authors, key journals/institutions/countries, most cited authors/journals/documents) for each of the clusters generated by the document clustering algorithm was obtained using bibliometrics. Another novel addition was the use of phrase auto-correlation maps to show technical thrust areas based on phrase co-occurrence in Abstracts, and the use of phrase-phrase cross-correlation maps to show technical thrust areas based on phrase relations due to the sharing of common co-occurring phrases. The ˜400 most cited nanotechnology papers since 1991 were grouped, and their characteristics generated. Whereas the main analysis provided technical thrusts of all nanotechnology papers retrieved, analysis of the most cited papers allowed their characteristics to be displayed. Finally, most cited papers from selected time periods were extracted, along with all publications from those time periods, and the institutions and countries were compared based on their representation in the most cited documents list relative to their representation in the most publications list.

  17. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    PubMed

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  18. Text grouping in patent analysis using adaptive K-means clustering algorithm

    NASA Astrophysics Data System (ADS)

    Shanie, Tiara; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.

  19. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.

    PubMed

    Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh

    2018-06-12

    Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.

  20. Sub-word image clustering in Farsi printed books

    NASA Astrophysics Data System (ADS)

    Soheili, Mohammad Reza; Kabir, Ehsanollah; Stricker, Didier

    2015-02-01

    Most OCR systems are designed for the recognition of a single page. In case of unfamiliar font faces, low quality papers and degraded prints, the performance of these products drops sharply. However, an OCR system can use redundancy of word occurrences in large documents to improve recognition results. In this paper, we propose a sub-word image clustering method for the applications dealing with large printed documents. We assume that the whole document is printed by a unique unknown font with low quality print. Our proposed method finds clusters of equivalent sub-word images with an incremental algorithm. Due to the low print quality, we propose an image matching algorithm for measuring the distance between two sub-word images, based on Hamming distance and the ratio of the area to the perimeter of the connected components. We built a ground-truth dataset of more than 111000 sub-word images to evaluate our method. All of these images were extracted from an old Farsi book. We cluster all of these sub-words, including isolated letters and even punctuation marks. Then all centers of created clusters are labeled manually. We show that all sub-words of the book can be recognized with more than 99.7% accuracy by assigning the label of each cluster center to all of its members.

  1. Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks.

    PubMed

    Wang, Chenguang; Song, Yangqiu; El-Kishky, Ahmed; Roth, Dan; Zhang, Ming; Han, Jiawei

    2015-08-01

    One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.

  2. Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

    PubMed Central

    Wang, Chenguang; Song, Yangqiu; El-Kishky, Ahmed; Roth, Dan; Zhang, Ming; Han, Jiawei

    2015-01-01

    One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features. PMID:26705504

  3. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases.

    PubMed

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-03-08

    High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank--like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL.

  4. The BioPrompt-box: an ontology-based clustering tool for searching in biological databases

    PubMed Central

    Corsi, Claudio; Ferragina, Paolo; Marangoni, Roberto

    2007-01-01

    Background High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This scenario generates severe problems not only at indexing time, where suitable algorithmic techniques for data indexing and retrieval are required, but also at query time, since a user query may produce such a large set of results that their browsing and "understanding" becomes humanly impractical. This problem is well known to the Web community, where a new generation of Web search engines is being developed, like Vivisimo. These tools organize on-the-fly the results of a user query in a hierarchy of labeled folders that ease their browsing and knowledge extraction. We investigate this approach on biological data, and propose the so called The BioPrompt-boxsoftware system which deploys ontology-driven clustering strategies for making the searching process of biologists more efficient and effective. Results The BioPrompt-box (Bpb) defines a document as a biological sequence plus its associated meta-data taken from the underneath databank – like references to ontologies or to external databanks, and plain texts as comments of researchers and (title, abstracts or even body of) papers. Bpboffers several tools to customize the search and the clustering process over its indexed documents. The user can search a set of keywords within a specific field of the document schema, or can execute Blastto find documents relative to homologue sequences. In both cases the search task returns a set of documents (hits) which constitute the answer to the user query. Since the number of hits may be large, Bpbclusters them into groups of homogenous content, organized as a hierarchy of labeled clusters. The user can actually choose among several ontology-based hierarchical clustering strategies, each offering a different "view" of the returned hits. Bpbcomputes these views by exploiting the meta-data present within the retrieved documents such as the references to Gene Ontology, the taxonomy lineage, the organism and the keywords. Of course, the approach is flexible enough to leave room for future additions of other meta-information. The ultimate goal of the clustering process is to provide the user with several different readings of the (maybe numerous) query results and show possible hidden correlations among them, thus improving their browsing and understanding. Conclusion Bpb is a powerful search engine that makes it very easy to perform complex queries over the indexed databanks (currently only UNIPROT is considered). The ontology-based clustering approach is efficient and effective, and could thus be applied successfully to larger databanks, like GenBank or EMBL. PMID:17430575

  5. Old document image segmentation using the autocorrelation function and multiresolution analysis

    NASA Astrophysics Data System (ADS)

    Mehri, Maroua; Gomez-Krämer, Petra; Héroux, Pierre; Mullot, Rémy

    2013-01-01

    Recent progress in the digitization of heterogeneous collections of ancient documents has rekindled new challenges in information retrieval in digital libraries and document layout analysis. Therefore, in order to control the quality of historical document image digitization and to meet the need of a characterization of their content using intermediate level metadata (between image and document structure), we propose a fast automatic layout segmentation of old document images based on five descriptors. Those descriptors, based on the autocorrelation function, are obtained by multiresolution analysis and used afterwards in a specific clustering method. The method proposed in this article has the advantage that it is performed without any hypothesis on the document structure, either about the document model (physical structure), or the typographical parameters (logical structure). It is also parameter-free since it automatically adapts to the image content. In this paper, firstly, we detail our proposal to characterize the content of old documents by extracting the autocorrelation features in the different areas of a page and at several resolutions. Then, we show that is possible to automatically find the homogeneous regions defined by similar indices of autocorrelation without knowledge about the number of clusters using adapted hierarchical ascendant classification and consensus clustering approaches. To assess our method, we apply our algorithm on 316 old document images, which encompass six centuries (1200-1900) of French history, in order to demonstrate the performance of our proposal in terms of segmentation and characterization of heterogeneous corpus content. Moreover, we define a new evaluation metric, the homogeneity measure, which aims at evaluating the segmentation and characterization accuracy of our methodology. We find a 85% of mean homogeneity accuracy. Those results help to represent a document by a hierarchy of layout structure and content, and to define one or more signatures for each page, on the basis of a hierarchical representation of homogeneous blocks and their topology.

  6. A Scalable Monitoring for the CMS Filter Farm Based on Elasticsearch

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andre, J.M.; et al.

    2015-12-23

    A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured informationmore » can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central” es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2.« less

  7. The Clusters - Collaborative Models of Sustainable Regional Development

    NASA Astrophysics Data System (ADS)

    Mănescu, Gabriel; Kifor, Claudiu

    2014-12-01

    The clusters are the subject of actions and of whole series of documents issued by national and international organizations, and, based on experience, many authorities promote the idea that because of the clusters, competitiveness increases, the workforce specializes, regional businesses and economies grow. The present paper is meant to be an insight into the initiatives of forming clusters in Romania. Starting from a comprehensive analysis of the development potential offered by each region of economic development, we present the main types of clusters grouped according to fields of activity and their overall objectives

  8. An Investigation of Document Partitions.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1986-01-01

    Empirical significance of document partitions is investigated as a function of index term-weight and similarity thresholds. Results show the same empirically preferred partitions can be detected by two independent strategies: an analysis of cluster-based retrieval analysis and an analysis of regularities in the underlying structure of the document…

  9. Basic firefly algorithm for document clustering

    NASA Astrophysics Data System (ADS)

    Mohammed, Athraa Jasim; Yusof, Yuhanis; Husni, Husniza

    2015-12-01

    The Document clustering plays significant role in Information Retrieval (IR) where it organizes documents prior to the retrieval process. To date, various clustering algorithms have been proposed and this includes the K-means and Particle Swarm Optimization. Even though these algorithms have been widely applied in many disciplines due to its simplicity, such an approach tends to be trapped in a local minimum during its search for an optimal solution. To address the shortcoming, this paper proposes a Basic Firefly (Basic FA) algorithm to cluster text documents. The algorithm employs the Average Distance to Document Centroid (ADDC) as the objective function of the search. Experiments utilizing the proposed algorithm were conducted on the 20Newsgroups benchmark dataset. Results demonstrate that the Basic FA generates a more robust and compact clusters than the ones produced by K-means and Particle Swarm Optimization (PSO).

  10. Toward the 21st Century: Preparing Proactive Visionary Transformational Leaders for Building Learning Communities. Human Resource Development. Orange County Cluster.

    ERIC Educational Resources Information Center

    Groff, Warren H.

    This document describes the Orange County Cluster human resources development (HRD) seminar that was conducted as part of Nova University's nontraditional practitioner-oriented, problem-solving, field-based distance education program in higher education. Discussed first are HRD in the agricultural and business industrial eras and changing HRD…

  11. Documentation for the machine-readable version of a table of Redshifts for Abell clusters (Sarazin, Rood and Struble 1982)

    NASA Technical Reports Server (NTRS)

    Warren, W. H., Jr.

    1983-01-01

    The machine readable catalog is described. The machine version contains the same data as the published table, which includes a second file with the notes. The computerized data files are prepared at the Astronomical Data Center. Detected discrepancies and cluster identifications based on photometric estimators are included.

  12. A super resolution framework for low resolution document image OCR

    NASA Astrophysics Data System (ADS)

    Ma, Di; Agam, Gady

    2013-01-01

    Optical character recognition is widely used for converting document images into digital media. Existing OCR algorithms and tools produce good results from high resolution, good quality, document images. In this paper, we propose a machine learning based super resolution framework for low resolution document image OCR. Two main techniques are used in our proposed approach: a document page segmentation algorithm and a modified K-means clustering algorithm. Using this approach, by exploiting coherence in the document, we reconstruct from a low resolution document image a better resolution image and improve OCR results. Experimental results show substantial gain in low resolution documents such as the ones captured from video.

  13. Pipelining Architecture of Indexing Using Agglomerative Clustering

    NASA Astrophysics Data System (ADS)

    Goyal, Deepika; Goyal, Deepti; Gupta, Parul

    2010-11-01

    The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.

  14. Swarm Intelligence in Text Document Clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cui, Xiaohui; Potok, Thomas E

    2008-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior. The research field that attempts to design algorithms or distributed problem-solving devices inspired by the collective behavior of social insect colonies is called Swarm Intelligence. Compared to the traditional algorithms, the swarm algorithms are usually flexible, robust, decentralized and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document collection clustering. The major challenge of today's information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role inmore » helping users to effectively navigate, summarize, and organize the overwhelmed information. In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. These clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools and ant food forage.« less

  15. Fuzzy Document Clustering Approach using WordNet Lexical Categories

    NASA Astrophysics Data System (ADS)

    Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.

    Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.

  16. On the map: Nature and Science editorials.

    PubMed

    Waaijer, Cathelijn J F; van Bochove, Cornelis A; van Eck, Nees Jan

    2011-01-01

    Bibliometric mapping of scientific articles based on keywords and technical terms in abstracts is now frequently used to chart scientific fields. In contrast, no significant mapping has been applied to the full texts of non-specialist documents. Editorials in Nature and Science are such non-specialist documents, reflecting the views of the two most read scientific journals on science, technology and policy issues. We use the VOSviewer mapping software to chart the topics of these editorials. A term map and a document map are constructed and clusters are distinguished in both of them. The validity of the document clustering is verified by a manual analysis of a sample of the editorials. This analysis confirms the homogeneity of the clusters obtained by mapping and augments the latter with further detail. As a result, the analysis provides reliable information on the distribution of the editorials over topics, and on differences between the journals. The most striking difference is that Nature devotes more attention to internal science policy issues and Science more to the political influence of scientists. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11192-010-0205-9) contains supplementary material, which is available to authorized users.

  17. Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres.

    PubMed

    Banerjee, Arindam; Ghosh, Joydeep

    2004-05-01

    Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.

  18. Recurrent-neural-network-based Boolean factor analysis and its application to word clustering.

    PubMed

    Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu

    2009-07-01

    The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.

  19. Spatial and temporal variability of microgeographic genetic structure in white-tailed deer

    USGS Publications Warehouse

    Scribner, Kim T.; Smith, Michael H.; Chesser, Ronald K.

    1997-01-01

    Techniques are described that define contiguous genetic subpopulations of white-tailed deer (Odocoileus virginianus) based on the spatial dispersion of 4,749 individuals that possessed discrete character values (alleles or genotypes) during each of 6 years (1974-1979). White-tailed deer were not uniformly distributed in space, but exhibited considerable spatial genetic structuring. Significant non-random clusters of individuals were documented during each year based on specific alleles and genotypes at the Sdh locus. Considerable temporal variation was observed in the position and genetic composition of specific clusters, which reflected changes in allele frequency in small geographic areas. The position of clusters did not consistently correspond with traditional management boundaries based on major discontinuities in habitat (swamp versus upland) and hunt compartments that were defined by roads and streams. Spatio-temporal stability of observed genetic contiguous clusters was interpreted relative to method and intensity of harvest, movements, and breeding ecology.

  20. Measuring Clinical Decision Support Influence on Evidence-Based Nursing Practice.

    PubMed

    Cortez, Susan; Dietrich, Mary S; Wells, Nancy

    2016-07-01

    To measure the effect of clinical decision support (CDS) on oncology nurse evidence-based practice (EBP).
. Longitudinal cluster-randomized design.
. Four distinctly separate oncology clinics associated with an academic medical center.
. The study sample was comprised of randomly selected data elements from the nursing documentation software. The data elements were patient-reported symptoms and the associated nurse interventions. The total sample observations were 600, derived from a baseline, posteducation, and postintervention sample of 200 each (100 in the intervention group and 100 in the control group for each sample).
. The cluster design was used to support randomization of the study intervention at the clinic level rather than the individual participant level to reduce possible diffusion of the study intervention. An elongated data collection cycle (11 weeks) controlled for temporary increases in nurse EBP related to the education or CDS intervention.
. The dependent variable was the nurse evidence-based documentation rate, calculated from the nurse-documented interventions. The independent variable was the CDS added to the nursing documentation software.
. The average EBP rate at baseline for the control and intervention groups was 27%. After education, the average EBP rate increased to 37%, and then decreased to 26% in the postintervention sample. Mixed-model linear statistical analysis revealed no significant interaction of group by sample. The CDS intervention did not result in an increase in nurse EBP.
. EBP education increased nurse EBP documentation rates significantly but only temporarily. Nurses may have used evidence in practice but may not have documented their interventions.
. More research is needed to understand the complex relationship between CDS, nursing practice, and nursing EBP intervention documentation. CDS may have a different effect on nurse EBP, physician EBP, and other medical professional EBP.

  1. Assessing the Amazon Cloud Suitability for CLARREO's Computational Needs

    NASA Technical Reports Server (NTRS)

    Goldin, Daniel; Vakhnin, Andrei A.; Currey, Jon C.

    2015-01-01

    In this document we compare the performance of the Amazon Web Services (AWS), also known as Amazon Cloud, with the CLARREO (Climate Absolute Radiance and Refractivity Observatory) cluster and assess its suitability for computational needs of the CLARREO mission. A benchmark executable to process one month and one year of PARASOL (Polarization and Anistropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar) data was used. With the optimal AWS configuration, adequate data-processing times, comparable to the CLARREO cluster, were found. The assessment of alternatives to the CLARREO cluster continues and several options, such as a NASA-based cluster, are being considered.

  2. GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.

    PubMed

    Zheng, Hai-Tao; Borchert, Charles; Kim, Hong-Gee

    2010-02-01

    Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.

  3. [Ti8Zr2O12(COO)16] Cluster: An Ideal Inorganic Building Unit for Photoactive Metal–Organic Frameworks

    PubMed Central

    2017-01-01

    Metal–organic frameworks (MOFs) based on Ti-oxo clusters (Ti-MOFs) represent a naturally self-assembled superlattice of TiO2 nanoparticles separated by designable organic linkers as antenna chromophores, epitomizing a promising platform for solar energy conversion. However, despite the vast, diverse, and well-developed Ti-cluster chemistry, only a scarce number of Ti-MOFs have been documented. The synthetic conditions of most Ti-based clusters are incompatible with those required for MOF crystallization, which has severely limited the development of Ti-MOFs. This challenge has been met herein by the discovery of the [Ti8Zr2O12(COO)16] cluster as a nearly ideal building unit for photoactive MOFs. A family of isoreticular photoactive MOFs were assembled, and their orbital alignments were fine-tuned by rational functionalization of organic linkers under computational guidance. These MOFs demonstrate high porosity, excellent chemical stability, tunable photoresponse, and good activity toward photocatalytic hydrogen evolution reactions. The discovery of the [Ti8Zr2O12(COO)16] cluster and the facile construction of photoactive MOFs from this cluster shall pave the way for the development of future Ti-MOF-based photocatalysts. PMID:29392182

  4. Multi-documents summarization based on clustering of learning object using hierarchical clustering

    NASA Astrophysics Data System (ADS)

    Mustamiin, M.; Budi, I.; Santoso, H. B.

    2018-03-01

    The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.

  5. Job Briefs. Career Education Guide.

    ERIC Educational Resources Information Center

    Dependents Schools (DOD), Washington, DC. European Area.

    The document contains 288 one-page job descriptions based on 1973 information for the following 11 career clusters: automotive technology, business/clerical/sales, computer technology, electricity/electronics, graphic communications, health/cosmetology, agriculture/conservation, artistic/literary/music, mechanical/transportation/construction,…

  6. Multi-Cultural Competency-Based Vocational Curricula. Clerical Clusters. Multi-Cultural Competency-Based Vocational/Technical Curricula Series.

    ERIC Educational Resources Information Center

    Hepburn, Larry; Shin, Masako

    This document, one of eight in a multi-cultural competency-based vocational/technical curricula series, is on clerical occupations. This program is designed to run 36 weeks and cover 10 instructional areas: beginning typing, typing I, typing II, duplicating, receptionist activities, general office procedures, operation of electronic calculator,…

  7. Topic detection using paragraph vectors to support active learning in systematic reviews.

    PubMed

    Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia

    2016-08-01

    Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Ranked centroid projection: a data visualization approach with self-organizing maps.

    PubMed

    Yen, G G; Wu, Z

    2008-02-01

    The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.

  9. Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.

    ERIC Educational Resources Information Center

    Griffiths, Alan; And Others

    1984-01-01

    Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…

  10. Application of diffusion maps to identify human factors of self-reported anomalies in aviation.

    PubMed

    Andrzejczak, Chris; Karwowski, Waldemar; Mikusinski, Piotr

    2012-01-01

    A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High-dimensionality data in the form of narrative text reports from the NASA Aviation Safety Reporting System (ASRS) were clustered and categorized by way of dimensionality reduction. Supervised analyses were performed to create a baseline document clustering system. Dimensionality reduction techniques identified concepts or keywords within records, and allowed the creation of a framework for an unsupervised document classification system. Results from the unsupervised clustering algorithm performed similarly to the supervised methods outlined in the study. The dimensionality reduction was performed on 100 of the most commonly occurring words within 126,000 text records describing commercial aviation incidents. This study demonstrates that unsupervised machine clustering and organization of incident reports is possible based on unbiased inputs. Findings from this study reinforced traditional views on what factors contribute to civil aviation anomalies, however, new associations between previously unrelated factors and conditions were also found.

  11. Illinois Occupational Skill Standards: Telecommunications Technician Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for workforce preparation program providers, details the Illinois Occupational Skill Standards for programs preparing students for employment in the telecommunications technician occupational cluster. The document begins with a brief overview of the Illinois perspective on occupational skills standards…

  12. Illinois Occupational Skill Standards: Automotive Technician Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the automotive technician cluster. The document begins with overviews of the Illinois perspective on occupational skill standards and…

  13. Illinois Occupational Skill Standards. Beef Production Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for workforce preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the beef production cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards and…

  14. Cluster-lensing: A Python Package for Galaxy Clusters and Miscentering

    NASA Astrophysics Data System (ADS)

    Ford, Jes; VanderPlas, Jake

    2016-12-01

    We describe a new open source package for calculating properties of galaxy clusters, including Navarro, Frenk, and White halo profiles with and without the effects of cluster miscentering. This pure-Python package, cluster-lensing, provides well-documented and easy-to-use classes and functions for calculating cluster scaling relations, including mass-richness and mass-concentration relations from the literature, as well as the surface mass density {{Σ }}(R) and differential surface mass density {{Δ }}{{Σ }}(R) profiles, probed by weak lensing magnification and shear. Galaxy cluster miscentering is especially a concern for stacked weak lensing shear studies of galaxy clusters, where offsets between the assumed and the true underlying matter distribution can lead to a significant bias in the mass estimates if not accounted for. This software has been developed and released in a public GitHub repository, and is licensed under the permissive MIT license. The cluster-lensing package is archived on Zenodo. Full documentation, source code, and installation instructions are available at http://jesford.github.io/cluster-lensing/.

  15. Illinois Occupational Skill Standards: Mechanical Drafting Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the mechanical drafting cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards and…

  16. Illinois Occupational Skill Standards: Architectural Drafting Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the architectural drafting cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards and…

  17. Illinois Occupational Skill Standards: In-Store Retailing Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the in-store retailing cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards…

  18. Illinois Occupational Skill Standards: Finishing and Distribution Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the finishing and distribution cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards…

  19. Illinois Occupational Skill Standards: Imaging/Pre-Press Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the imaging/pre-press cluster. The document begins with a brief overview of the Illinois perspective on occupational skill standards and…

  20. Cluster-based MOFs with accelerated chemical conversion of CO2 through C-C bond formation.

    PubMed

    Xiong, Gang; Yu, Bing; Dong, Jie; Shi, Ying; Zhao, Bin; He, Liang-Nian

    2017-05-30

    Investigations on metal-organic frameworks (MOFs) as direct catalysts have been well documented, but direct catalysis of the chemical conversion of terminal alkynes and CO 2 as chemical feedstock by MOFs into valuable chemical products has never been reported. We report here two cluster-based MOFs I and II assembled from a multinuclear Gd-cluster and Cu-cluster, displaying high thermal and solvent stabilities. I and II as heterogeneous catalysts possess active catalytic centers [Cu 12 I 12 ] and [Cu 3 I 2 ], respectively, exhibiting excellent catalytic performance in the carboxylation reactions of CO 2 with 14 kinds of terminal alkynes under 1 atm and mild conditions. For the first time catalysis of the carboxylation reaction of terminal alkynes with CO 2 by MOF materials without any cocatalyst/additive is reported. This work not only reduces greenhouse gas emission but also provides highly valuable materials, opening a wide space in seeking recoverable catalysts to accelerate the chemical conversion of CO 2 .

  1. Tech-Prep Competency Profiles within the Engineering Technologies Cluster.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document contains 12 competency profiles for tech prep courses within the engineering technologies cluster. The document consists of the following sections: (1) systemic curriculum reform philosophy--Ohio's vision of tech prep and its six critical components; (2) an explanation of the process of developing the tech prep competencies; (3) a…

  2. Development of a model of the tobacco industry's interference with tobacco control programmes

    PubMed Central

    Trochim, W; Stillman, F; Clark, P; Schmitt, C

    2003-01-01

    Objective: To construct a conceptual model of tobacco industry tactics to undermine tobacco control programmes for the purposes of: (1) developing measures to evaluate industry tactics, (2) improving tobacco control planning, and (3) supplementing current or future frameworks used to classify and analyse tobacco industry documents. Design: Web based concept mapping was conducted, including expert brainstorming, sorting, and rating of statements describing industry tactics. Statistical analyses used multidimensional scaling and cluster analysis. Interpretation of the resulting maps was accomplished by an expert panel during a face-to-face meeting. Subjects: 34 experts, selected because of their previous encounters with industry resistance or because of their research into industry tactics, took part in some or all phases of the project. Results: Maps with eight non-overlapping clusters in two dimensional space were developed, with importance ratings of the statements and clusters. Cluster and quadrant labels were agreed upon by the experts. Conclusions: The conceptual maps summarise the tactics used by the industry and their relationships to each other, and suggest a possible hierarchy for measures that can be used in statistical modelling of industry tactics and for review of industry documents. Finally, the maps enable hypothesis of a likely progression of industry reactions as public health programmes become more successful, and therefore more threatening to industry profits. PMID:12773723

  3. Clustering of Farsi sub-word images for whole-book recognition

    NASA Astrophysics Data System (ADS)

    Soheili, Mohammad Reza; Kabir, Ehsanollah; Stricker, Didier

    2015-01-01

    Redundancy of word and sub-word occurrences in large documents can be effectively utilized in an OCR system to improve recognition results. Most OCR systems employ language modeling techniques as a post-processing step; however these techniques do not use important pictorial information that exist in the text image. In case of large-scale recognition of degraded documents, this information is even more valuable. In our previous work, we proposed a subword image clustering method for the applications dealing with large printed documents. In our clustering method, the ideal case is when all equivalent sub-word images lie in one cluster. To overcome the issues of low print quality, the clustering method uses an image matching algorithm for measuring the distance between two sub-word images. The measured distance with a set of simple shape features were used to cluster all sub-word images. In this paper, we analyze the effects of adding more shape features on processing time, purity of clustering, and the final recognition rate. Previously published experiments have shown the efficiency of our method on a book. Here we present extended experimental results and evaluate our method on another book with totally different font face. Also we show that the number of the new created clusters in a page can be used as a criteria for assessing the quality of print and evaluating preprocessing phases.

  4. Word spotting for handwritten documents using Chamfer Distance and Dynamic Time Warping

    NASA Astrophysics Data System (ADS)

    Saabni, Raid M.; El-Sana, Jihad A.

    2011-01-01

    A large amount of handwritten historical documents are located in libraries around the world. The desire to access, search, and explore these documents paves the way for a new age of knowledge sharing and promotes collaboration and understanding between human societies. Currently, the indexes for these documents are generated manually, which is very tedious and time consuming. Results produced by state of the art techniques, for converting complete images of handwritten documents into textual representations, are not yet sufficient. Therefore, word-spotting methods have been developed to archive and index images of handwritten documents in order to enable efficient searching within documents. In this paper, we present a new matching algorithm to be used in word-spotting tasks for historical Arabic documents. We present a novel algorithm based on the Chamfer Distance to compute the similarity between shapes of word-parts. Matching results are used to cluster images of Arabic word-parts into different classes using the Nearest Neighbor rule. To compute the distance between two word-part images, the algorithm subdivides each image into equal-sized slices (windows). A modified version of the Chamfer Distance, incorporating geometric gradient features and distance transform data, is used as a similarity distance between the different slices. Finally, the Dynamic Time Warping (DTW) algorithm is used to measure the distance between two images of word-parts. By using the DTW we enabled our system to cluster similar word-parts, even though they are transformed non-linearly due to the nature of handwriting. We tested our implementation of the presented methods using various documents in different writing styles, taken from Juma'a Al Majid Center - Dubai, and obtained encouraging results.

  5. QCS : a system for querying, clustering, and summarizing documents.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dunlavy, Daniel M.

    2006-08-01

    Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence ''trimming'', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less

  6. QCS: a system for querying, clustering and summarizing documents.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dunlavy, Daniel M.; Schlesinger, Judith D.; O'Leary, Dianne P.

    2006-10-01

    Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test setsmore » from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.« less

  7. Relational Learning via Collective Matrix Factorization

    DTIC Science & Technology

    2008-06-01

    well-known example of such a schema is pLSI- pHITS [13], which models document-word counts and document-document citations: E1 = words and E2 = E3...relational co- clustering include pLSI, pLSI- pHITS , the symmetric block models of Long et. al. [23, 24, 25], and Bregman tensor clustering [5] (which can...to pLSI- pHITS In this section we provide an example where the additional flexibility of collective matrix factorization leads to better results; and

  8. Personality disorders: review and clinical application in daily practice.

    PubMed

    Angstman, Kurt B; Rasmussen, Norman H

    2011-12-01

    Personality disorders have been documented in approximately 9 percent of the general U.S. population. Psychotherapy, pharmacotherapy, and brief interventions designed for use by family physicians can improve the health of patients with these disorders. Personality disorders are classified into clusters A, B, and C. Cluster A includes schizoid, schizotypal, and paranoid personality disorders. Cluster B includes borderline, histrionic, antisocial, and narcissistic personality disorders. Cluster C disorders are more prevalent and include avoidant, dependent, and obsessive-compulsive personality disorders. Many patients with personality disorders can be treated by family physicians. Patients with borderline personality disorder may benefit from the use of omega-3 fatty acids, second-generation antipsychotics, and mood stabilizers. Patients with antisocial personality disorder may benefit from the use of mood stabilizers, antipsychotics, and antidepressants. Other therapeutic interventions include motivational interviewing and solution-based problem solving.

  9. Intuitive color-based visualization of multimedia content as large graphs

    NASA Astrophysics Data System (ADS)

    Delest, Maylis; Don, Anthony; Benois-Pineau, Jenny

    2004-06-01

    Data visualization techniques are penetrating in various technological areas. In the field of multimedia such as information search and retrieval in multimedia archives, or digital media production and post-production, data visualization methodologies based on large graphs give an exciting alternative to conventional storyboard visualization. In this paper we develop a new approach to visualization of multimedia (video) documents based both on large graph clustering and preliminary video segmenting and indexing.

  10. Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes.

    PubMed

    Xuan, Junyu; Lu, Jie; Zhang, Guangquan; Xu, Richard Yi Da; Luo, Xiangfeng

    2018-05-01

    Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.

  11. Towards a methodology for cluster searching to provide conceptual and contextual "richness" for systematic reviews of complex interventions: case study (CLUSTER).

    PubMed

    Booth, Andrew; Harris, Janet; Croot, Elizabeth; Springett, Jane; Campbell, Fiona; Wilkins, Emma

    2013-09-28

    Systematic review methodologies can be harnessed to help researchers to understand and explain how complex interventions may work. Typically, when reviewing complex interventions, a review team will seek to understand the theories that underpin an intervention and the specific context for that intervention. A single published report from a research project does not typically contain this required level of detail. A review team may find it more useful to examine a "study cluster"; a group of related papers that explore and explain various features of a single project and thus supply necessary detail relating to theory and/or context.We sought to conduct a preliminary investigation, from a single case study review, of techniques required to identify a cluster of related research reports, to document the yield from such methods, and to outline a systematic methodology for cluster searching. In a systematic review of community engagement we identified a relevant project - the Gay Men's Task Force. From a single "key pearl citation" we conducted a series of related searches to find contextually or theoretically proximate documents. We followed up Citations, traced Lead authors, identified Unpublished materials, searched Google Scholar, tracked Theories, undertook ancestry searching for Early examples and followed up Related projects (embodied in the CLUSTER mnemonic). Our structured, formalised procedure for cluster searching identified useful reports that are not typically identified from topic-based searches on bibliographic databases. Items previously rejected by an initial sift were subsequently found to inform our understanding of underpinning theory (for example Diffusion of Innovations Theory), context or both. Relevant material included book chapters, a Web-based process evaluation, and peer reviewed reports of projects sharing a common ancestry. We used these reports to understand the context for the intervention and to explore explanations for its relative lack of success. Additional data helped us to challenge simplistic assumptions on the homogeneity of the target population. A single case study suggests the potential utility of cluster searching, particularly for reviews that depend on an understanding of context, e.g. realist synthesis. The methodology is transparent, explicit and reproducible. There is no reason to believe that cluster searching is not generalizable to other review topics. Further research should examine the contribution of the methodology beyond improved yield, to the final synthesis and interpretation, possibly by utilizing qualitative sensitivity analysis.

  12. System support documentation: IDIMS FUNCTION AMOEBA

    NASA Technical Reports Server (NTRS)

    Bryant, J.

    1982-01-01

    A listing is provided for AMOEBA, a clustering program based on a spatial-spectral model for image data. The program is fast and automatic (in the sense that no parameters are required), and classifies each picture element into classes which are determined internally. As an IDIMS function, no limit on the size of the image is imposed.

  13. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data

    PubMed Central

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest. PMID:26958859

  14. Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data.

    PubMed

    Freire, Sergio Miranda; Teodoro, Douglas; Wei-Kleiner, Fang; Sundvall, Erik; Karlsson, Daniel; Lambrix, Patrick

    2016-01-01

    This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.

  15. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  16. Effects of multiple founder populations on spatial genetic structure of reintroduced American martens.

    PubMed

    Williams, Bronwyn W; Scribner, Kim T

    2010-01-01

    Reintroductions and translocations are increasingly used to repatriate or increase probabilities of persistence for animal and plant species. Genetic and demographic characteristics of founding individuals and suitability of habitat at release sites are commonly believed to affect the success of these conservation programs. Genetic divergence among multiple source populations of American martens (Martes americana) and well documented introduction histories permitted analyses of post-introduction dispersion from release sites and development of genetic clusters in the Upper Peninsula (UP) of Michigan <50 years following release. Location and size of spatial genetic clusters and measures of individual-based autocorrelation were inferred using 11 microsatellite loci. We identified three genetic clusters in geographic proximity to original release locations. Estimated distances of effective gene flow based on spatial autocorrelation varied greatly among genetic clusters (30-90 km). Spatial contiguity of genetic clusters has been largely maintained with evidence for admixture primarily in localized regions, suggesting recent contact or locally retarded rates of gene flow. Data provide guidance for future studies of the effects of permeabilities of different land-cover and land-use features to dispersal and of other biotic and environmental factors that may contribute to the colonization process and development of spatial genetic associations.

  17. Subject Indexing and Citation Indexing--Part I: Clustering Structure in the Cystic Fibrosis Document Collection [and] Part II: An Evaluation and Comparison.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1990-01-01

    These two articles discuss clustering structure in the Cystic Fibrosis Document Collection, which is derived from the National Library of Medicine's MEDLINE file. The exhaustivity of four subject representations and two citation representations is examined, and descriptor-weight thresholds and similarity thresholds are used to compute…

  18. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    PubMed

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  19. Semiotic indexing of digital resources

    DOEpatents

    Parker, Charles T; Garrity, George M

    2014-12-02

    A method of classifying a plurality of documents. The method includes steps of providing a first set of classification terms and a second set of classification terms, the second set of classification terms being different from the first set of classification terms; generating a first frequency array of a number of occurrences of each term from the first set of classification terms in each document; generating a second frequency array of a number of occurrences of each term from the second set of classification terms in each document; generating a first similarity matrix from the first frequency array; generating a second similarity matrix from the second frequency array; determining an entrywise combination of the first similarity matrix and the second similarity matrix; and clustering the plurality of documents based on the result of the entrywise combination.

  20. Implementation of evidence-based weekend service recommendations for allied health managers: a cluster randomised controlled trial protocol.

    PubMed

    Sarkies, Mitchell N; White, Jennifer; Morris, Meg E; Taylor, Nicholas F; Williams, Cylie; O'Brien, Lisa; Martin, Jenny; Bardoel, Anne; Holland, Anne E; Carey, Leeanne; Skinner, Elizabeth H; Bowles, Kelly-Ann; Grant, Kellie; Philip, Kathleen; Haines, Terry P

    2018-04-24

    It is widely acknowledged that health policy and practice do not always reflect current research evidence. Whether knowledge transfer from research to practice is more successful when specific implementation approaches are used remains unclear. A model to assist engagement of allied health managers and clinicians with research implementation could involve disseminating evidence-based policy recommendations, along with the use of knowledge brokers. We developed such a model to aid decision-making for the provision of weekend allied health services. This protocol outlines the design and methods for a multi-centre cluster randomised controlled trial to evaluate the success of research implementation strategies to promote evidence-informed weekend allied health resource allocation decisions, especially in hospital managers. This multi-centre study will be a three-group parallel cluster randomised controlled trial. Allied health managers from Australian and New Zealand hospitals will be randomised to receive either (1) an evidence-based policy recommendation document to guide weekend allied health resource allocation decisions, (2) the same policy recommendation document with support from a knowledge broker to help implement weekend allied health policy recommendations, or (3) a usual practice control group. The primary outcome will be alignment of weekend allied health service provision with policy recommendations. This will be measured by the number of allied health service events (occasions of service) occurring on weekends as a proportion of total allied health service events for the relevant hospital wards at baseline and 12-month follow-up. Evidence-based policy recommendation documents communicate key research findings in an accessible format. This comparatively low-cost research implementation strategy could be combined with using a knowledge broker to work collaboratively with decision-makers to promote knowledge transfer. The results will assist managers to make decisions on resource allocation, based on evidence. More generally, the findings will inform the development of an allied health model for translating research into practice. This trial is registered with the Australian New Zealand Clinical Trials Registry (ANZCTR) ( ACTRN12618000029291 ). Universal Trial Number (UTN): U1111-1205-2621.

  1. Temporal Methods to Detect Content-Based Anomalies in Social Media

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skryzalin, Jacek; Field, Jr., Richard; Fisher, Andrew N.

    Here, we develop a method for time-dependent topic tracking and meme trending in social media. Our objective is to identify time periods whose content differs signifcantly from normal, and we utilize two techniques to do so. The first is an information-theoretic analysis of the distributions of terms emitted during different periods of time. In the second, we cluster documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.

  2. Increased Symptom Reporting in Young Athletes Based on History of Previous Concussions.

    PubMed

    Moser, Rosemarie Scolaro; Schatz, Philip

    2017-01-01

    Research documents increased symptoms in adolescents with a history of two or more concussions. This study examined baseline evaluations of 2,526 younger athletes, ages 10 to 14. Between-groups analyses examined Post Concussion Symptom Scale symptoms by concussion history group (None, One, Two+) and clusters of Physical, Cognitive, Emotional, and Sleep symptoms. Healthy younger athletes with a concussion history reported greater physical, emotional, and sleep-related symptoms than those with no history of concussion, with a greater endorsement in physical/sleep symptom clusters. Findings suggest younger athletes with a history of multiple concussions may experience residual symptoms.

  3. A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

    PubMed

    Murray, Nicholas P; Hunfalvay, Melissa

    2017-02-01

    Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.

  4. 3-base periodicity in coding DNA is affected by intercodon dinucleotides

    PubMed Central

    Sánchez, Joaquín

    2011-01-01

    All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where “|” indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed. PMID:21814388

  5. Text-mining analysis of mHealth research.

    PubMed

    Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies.

  6. Text-mining analysis of mHealth research

    PubMed Central

    Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies. PMID:29430456

  7. Machine learning-based coreference resolution of concepts in clinical documents

    PubMed Central

    Ware, Henry; Mullett, Charles J; El-Rawas, Oussama

    2012-01-01

    Objective Coreference resolution of concepts, although a very active area in the natural language processing community, has not yet been widely applied to clinical documents. Accordingly, the 2011 i2b2 competition focusing on this area is a timely and useful challenge. The objective of this research was to collate coreferent chains of concepts from a corpus of clinical documents. These concepts are in the categories of person, problems, treatments, and tests. Design A machine learning approach based on graphical models was employed to cluster coreferent concepts. Features selected were divided into domain independent and domain specific sets. Training was done with the i2b2 provided training set of 489 documents with 6949 chains. Testing was done on 322 documents. Results The learning engine, using the un-weighted average of three different measurement schemes, resulted in an F measure of 0.8423 where no domain specific features were included and 0.8483 where the feature set included both domain independent and domain specific features. Conclusion Our machine learning approach is a promising solution for recognizing coreferent concepts, which in turn is useful for practical applications such as the assembly of problem and medication lists from clinical documents. PMID:22582205

  8. Instance-Based Question Answering

    DTIC Science & Technology

    2006-12-01

    answer clustering, composition, and scoring. Moreover, with the effort dedicated to improving monolingual system performance, system parameters are...text collections: document type, manual or automatic annotations (if any), and stylistic and notational differences in technical terms. Monolingual ...forum in which cross language retrieval systems and question answering systems are tested for various Eu- ropean languages. The CLEF QA monolingual task

  9. Mapping texts through dimensionality reduction and visualization techniques for interactive exploration of document collections

    NASA Astrophysics Data System (ADS)

    de Andrade Lopes, Alneu; Minghim, Rosane; Melo, Vinícius; Paulovich, Fernando V.

    2006-01-01

    The current availability of information many times impair the tasks of searching, browsing and analyzing information pertinent to a topic of interest. This paper presents a methodology to create a meaningful graphical representation of documents corpora targeted at supporting exploration of correlated documents. The purpose of such an approach is to produce a map from a document body on a research topic or field based on the analysis of their contents, and similarities amongst articles. The document map is generated, after text pre-processing, by projecting the data in two dimensions using Latent Semantic Indexing. The projection is followed by hierarchical clustering to support sub-area identification. The map can be interactively explored, helping to narrow down the search for relevant articles. Tests were performed using a collection of documents pre-classified into three research subject classes: Case-Based Reasoning, Information Retrieval, and Inductive Logic Programming. The map produced was capable of separating the main areas and approaching documents by their similarity, revealing possible topics, and identifying boundaries between them. The tool can deal with the exploration of inter-topics and intra-topic relationship and is useful in many contexts that need deciding on relevant articles to read, such as scientific research, education, and training.

  10. A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization.

    PubMed

    He, Sheng; Samara, Petros; Burgers, Jan; Schomaker, Lambert

    2016-11-01

    It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results.

  11. The Profile-Query Relationship.

    ERIC Educational Resources Information Center

    Shepherd, Michael A.; Phillips, W. J.

    1986-01-01

    Defines relationship between user profile and user query in terms of relationship between clusters of documents retrieved by each, and explores the expression of cluster similarity and cluster overlap as linear functions of similarity existing between original pairs of profiles and queries, given the desired retrieval threshold. (23 references)…

  12. Distance Probes of Dark Energy

    DOE PAGES

    Kim, A. G.; Padmanabhan, N.; Aldering, G.; ...

    2015-03-15

    We present the results from the Distances subgroup of the Cosmic Frontier Community Planning Study (Snowmass 2013). This document summarizes the current state of the field as well as future prospects and challenges. In addition to the established probes using Type Ia supernovae and baryon acoustic oscillations, we also consider prospective methods based on clusters, active galactic nuclei, gravitational wave sirens and strong lensing time delays.

  13. Soft Clustering Criterion Functions for Partitional Document Clustering

    DTIC Science & Technology

    2004-05-26

    in the clus- ter that it already belongs to. The refinement phase ends, as soon as we perform an iteration in which no documents moved between...for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 26 MAY 2004 2... it with the one obtained by the hard criterion functions. We present a comprehensive experimental evaluation involving twelve differ- ent datasets

  14. Font adaptive word indexing of modern printed documents.

    PubMed

    Marinai, Simone; Marino, Emanuele; Soda, Giovanni

    2006-08-01

    We propose an approach for the word-level indexing of modern printed documents which are difficult to recognize using current OCR engines. By means of word-level indexing, it is possible to retrieve the position of words in a document, enabling queries involving proximity of terms. Web search engines implement this kind of indexing, allowing users to retrieve Web pages on the basis of their textual content. Nowadays, digital libraries hold collections of digitized documents that can be retrieved either by browsing the document images or relying on appropriate metadata assembled by domain experts. Word indexing tools would therefore increase the access to these collections. The proposed system is designed to index homogeneous document collections by automatically adapting to different languages and font styles without relying on OCR engines for character recognition. The approach is based on three main ideas: the use of Self Organizing Maps (SOM) to perform unsupervised character clustering, the definition of one suitable vector-based word representation whose size depends on the word aspect-ratio, and the run-time alignment of the query word with indexed words to deal with broken and touching characters. The most appropriate applications are for processing modern printed documents (17th to 19th centuries) where current OCR engines are less accurate. Our experimental analysis addresses six data sets containing documents ranging from books of the 17th century to contemporary journals.

  15. Mixed-Initiative Clustering

    ERIC Educational Resources Information Center

    Huang, Yifen

    2010-01-01

    Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other. The first contribution or this thesis is providing a framework of…

  16. Hierarchical Clustering: A Bibliography. Technical Report No. 1.

    ERIC Educational Resources Information Center

    Farrell, William T.

    "Classification: Purposes, Principles, Progress, Prospects" by Robert R. Sokal is reprinted in this document. It summarizes the principles of classification and cluster analysis in a manner which is of specific value to the Marine Corps Office of Manpower Utilization. Following the article is a 184 item bibliography on cluster analysis…

  17. Non-redundant patent sequence databases with value-added annotations at two levels

    PubMed Central

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134

  18. Non-redundant patent sequence databases with value-added annotations at two levels.

    PubMed

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  19. Stalking: developing an empirical typology to classify stalkers.

    PubMed

    Del Ben, Kevin; Fremouw, W

    2002-01-01

    Stalking has received a great deal of attention from the media and its harmful effects on victims have been well documented. Stalking is also more common than previously thought, leading researchers to classify stalkers into groups in an attempt to predict future behavior. Previous research has grouped stalkers based on theoretical models rather than trying to empirically examine stalking behaviors along with other factors such as motivation, type of relationship, and attachment style in determining a typology of stalkers. Female college students (N = 108) who had experienced stalking behaviors responded to questions regarding their perceptions of those behaviors. First, these victim perceptions were factor analyzed. Then, cluster analysis grouped those factors to produce a four-cluster typology of stalkers. Cluster 1 (Harmless) appeared to reflect a more casual, less jealous pattern of behavior. Cluster 2 (Low Threat) appeared the least likely to become physically violent or threatening, or to engage in illegal behaviors. Cluster 3 (Violent Criminal) appeared to be the most likely to engage in physically threatening and illegal behaviors. Cluster 4 (High Threat) was characterized by a more serious type of relationship and may attempt to be more restrictive of their partner when first meeting them.

  20. A pilot study of a heuristic algorithm for novel template identification from VA electronic medical record text.

    PubMed

    Redd, Andrew M; Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Tran, Le-Thuy; Samore, Matthew H

    2017-07-01

    Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up. Published by Elsevier Inc.

  1. Delineation of gravel-bed clusters via factorial kriging

    NASA Astrophysics Data System (ADS)

    Wu, Fu-Chun; Wang, Chi-Kuei; Huang, Guo-Hao

    2018-05-01

    Gravel-bed clusters are the most prevalent microforms that affect local flows and sediment transport. A growing consensus is that the practice of cluster delineation should be based primarily on bed topography rather than grain sizes. Here we present a novel approach for cluster delineation using patch-scale high-resolution digital elevation models (DEMs). We use a geostatistical interpolation method, i.e., factorial kriging, to decompose the short- and long-range (grain- and microform-scale) DEMs. The required parameters are determined directly from the scales of the nested variograms. The short-range DEM exhibits a flat bed topography, yet individual grains are sharply outlined, making the short-range DEM a useful aid for grain segmentation. The long-range DEM exhibits a smoother topography than the original full DEM, yet groupings of particles emerge as small-scale bedforms, making the contour percentile levels of the long-range DEM a useful tool for cluster identification. Individual clusters are delineated using the segmented grains and identified clusters via a range of contour percentile levels. Our results reveal that the density and total area of delineated clusters decrease with increasing contour percentile level, while the mean grain size of clusters and average size of anchor clast (i.e., the largest particle in a cluster) increase with the contour percentile level. These results support the interpretation that larger particles group as clusters and protrude higher above the bed than other smaller grains. A striking feature of the delineated clusters is that anchor clasts are invariably greater than the D90 of the grain sizes even though a threshold anchor size was not adopted herein. The average areal fractal dimensions (Hausdorff-Besicovich dimensions of the projected areas) of individual clusters, however, demonstrate that clusters delineated with different contour percentile levels exhibit similar planform morphologies. Comparisons with a compilation of existing field data show consistency with the cluster properties documented in a wide variety of settings. This study thus points toward a promising, alternative DEM-based approach to characterizing sediment structures in gravel-bed rivers.

  2. A Comparison of Anammox Bacterial Abundance and Community Structures in Three Different Emerged Plants-Related Sediments.

    PubMed

    Chu, Jinyu; Zhang, Jinping; Zhou, Xiaohong; Liu, Biao; Li, Yimin

    2015-09-01

    Quantitative polymerase chain reaction (qPCR) assays and 16S rRNA gene clone libraries were used to document the abundance, diversity and community structure of anaerobic ammonia-oxidising (anammox) bacteria in the rhizosphere and non-rhizosphere sediments of three emergent macrophyte species (Iris pseudacorus, Thalia dealbata and Typha orientalis). The qPCR results confirmed the existence of anammox bacteria (AMX) with observed log number of gene copies per dry gram sediment ranging from 5.00 to 6.78. AMX was more abundant in T. orientalis-associated sediments than in the other two plant species. The I. pseudacorus- and T. orientalis-associated sediments had higher Shannon diversity values, indicating higher AMX diversity in these sediments. Based on the 16S rRNA gene, Candidatus 'Brocadia', Candidatus 'Kuenenia', Candidatus 'Jettenia' and new clusters were observed with the predominant Candidatus 'Kuenenia' cluster. The I. pseudacorus-associated sediments contained all the sequences of the C. 'Jettenia' cluster. Sequences obtained from T. orientalis-associated sediments contributed more than 90 % sequences in the new cluster, whereas none was found from I. pseudacorus. The new cluster was distantly related to known sequences; thus, this cluster was grouped outside the known clusters, indicating that the new cluster may be a new Planctomycetales genus. Further studies should be undertaken to confirm this finding.

  3. A suffix arrays based approach to semantic search in P2P systems

    NASA Astrophysics Data System (ADS)

    Shi, Qingwei; Zhao, Zheng; Bao, Hu

    2007-09-01

    Building a semantic search system on top of peer-to-peer (P2P) networks is becoming an attractive and promising alternative scheme for the reason of scalability, Data freshness and search cost. In this paper, we present a Suffix Arrays based algorithm for Semantic Search (SASS) in P2P systems, which generates a distributed Semantic Overlay Network (SONs) construction for full-text search in P2P networks. For each node through the P2P network, SASS distributes document indices based on a set of suffix arrays, by which clusters are created depending on words or phrases shared between documents, therefore, the search cost for a given query is decreased by only scanning semantically related documents. In contrast to recently announced SONs scheme designed by using metadata or predefined-class, SASS is an unsupervised approach for decentralized generation of SONs. SASS is also an incremental, linear time algorithm, which efficiently handle the problem of nodes update in P2P networks. Our simulation results demonstrate that SASS yields high search efficiency in dynamic environments.

  4. Exploring supervised and unsupervised methods to detect topics in biomedical text

    PubMed Central

    Lee, Minsuk; Wang, Weiqing; Yu, Hong

    2006-01-01

    Background Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. Results We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4%, for predicting the topic of an OMIM article as one of the total twenty-five topics. Conclusion Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings. PMID:16539745

  5. Signature detection and matching for document image retrieval.

    PubMed

    Zhu, Guangyu; Zheng, Yefeng; Doermann, David; Jaeger, Stefan

    2009-11-01

    As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.

  6. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    PubMed

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  7. Online writer identification using alphabetic information clustering

    NASA Astrophysics Data System (ADS)

    Tan, Guo Xian; Viard-Gaudin, Christian; Kot, Alex C.

    2009-01-01

    Writer identification is a topic of much renewed interest today because of its importance in applications such as writer adaptation, routing of documents and forensic document analysis. Various algorithms have been proposed to handle such tasks. Of particular interests are the approaches that use allographic features [1-3] to perform a comparison of the documents in question. The allographic features are used to define prototypes that model the unique handwriting styles of the individual writers. This paper investigates a novel perspective that takes alphabetic information into consideration when the allographic features are clustered into prototypes at the character level. We hypothesize that alphabetic information provides additional clues which help in the clustering of allographic prototypes. An alphabet information coefficient (AIC) has been introduced in our study and the effect of this coefficient is presented. Our experiments showed an increase of writer identification accuracy from 66.0% to 87.0% when alphabetic information was used in conjunction with allographic features on a database of 200 reference writers.

  8. Comment on "An Evaluation of Query Expansion by the Addition of Clustered Terms for a Document Retrieval System"

    ERIC Educational Resources Information Center

    Salton, G.

    1972-01-01

    The author emphasized that one cannot conclude from the experiments reported upon that term clusters (or equivalently, keyword classifications or thesauruses) are not useful in retrieval. (2 references) (Author)

  9. BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.

    PubMed

    Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis

    2014-11-15

    The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Holder and Topic Based Analysis of Emotions on Blog Texts: A Case Study for Bengali

    NASA Astrophysics Data System (ADS)

    Das, Dipankar; Bandyopadhyay, Sivaji

    The paper presents an extended approach of analyzing emotions of the blog users on different topics. The rule based techniques to identify emotion holders and topics with respect to their corresponding emotional expressions helps to develop the baseline system. On the other hand, the Support Vector Machine (SVM) based supervised framework identifies the holders, topics and emotional expressions from the blog sentences by outperforming the baseline system. The existence of many to many relations between the holders and the topics with respect to Ekman's six different emotion classes has been examined using two way evaluation techniques, one is with respect to holder and other is from the perspective of topic. The results of the system were found satisfactory in comparison with the agreement of the subjective annotation. The error analysis shows that the topic of a blog at document level is not always conveyed at the sentence level. Moreover, the difficulty in identifying topic from a blog document is due to the problem of identifying some features like bigrams, Named Entities and sentiment. Thus, we employed a semantic clustering approach along with these features to identify the similarity between document level topic and sentential topic as well as to improve the results of identifying the document level topic.

  11. Determinates of clustering across America's national parks: An application of the Gini coefficients

    Treesearch

    R. Geoffrey Lacher; Matthew T.J. Brownlee

    2012-01-01

    The changes in the clustering of visitation across National Park Service (NPS) sites have not been well documented or widely studied. This paper investigates the changes in the dispersion of visitation across NPS sites with the Gini coefficient, a popular measure of inequality used primarily in the field of economics. To calculate the degree of clustering nationally,...

  12. Cross-language information retrieval using PARAFAC2.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bader, Brett William; Chew, Peter; Abdelali, Ahmed

    A standard approach to cross-language information retrieval (CLIR) uses Latent Semantic Analysis (LSA) in conjunction with a multilingual parallel aligned corpus. This approach has been shown to be successful in identifying similar documents across languages - or more precisely, retrieving the most similar document in one language to a query in another language. However, the approach has severe drawbacks when applied to a related task, that of clustering documents 'language-independently', so that documents about similar topics end up closest to one another in the semantic space regardless of their language. The problem is that documents are generally more similar tomore » other documents in the same language than they are to documents in a different language, but on the same topic. As a result, when using multilingual LSA, documents will in practice cluster by language, not by topic. We propose a novel application of PARAFAC2 (which is a variant of PARAFAC, a multi-way generalization of the singular value decomposition [SVD]) to overcome this problem. Instead of forming a single multilingual term-by-document matrix which, under LSA, is subjected to SVD, we form an irregular three-way array, each slice of which is a separate term-by-document matrix for a single language in the parallel corpus. The goal is to compute an SVD for each language such that V (the matrix of right singular vectors) is the same across all languages. Effectively, PARAFAC2 imposes the constraint, not present in standard LSA, that the 'concepts' in all documents in the parallel corpus are the same regardless of language. Intuitively, this constraint makes sense, since the whole purpose of using a parallel corpus is that exactly the same concepts are expressed in the translations. We tested this approach by comparing the performance of PARAFAC2 with standard LSA in solving a particular CLIR problem. From our results, we conclude that PARAFAC2 offers a very promising alternative to LSA not only for multilingual document clustering, but also for solving other problems in cross-language information retrieval.« less

  13. Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.

    ERIC Educational Resources Information Center

    Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald

    2001-01-01

    Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…

  14. Illinois Occupational Skill Standards: Information Technology Operate Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document contains Illinois Occupational Skill Standards for occupations in the Information Technology Operate Cluster (help desk support, computer maintenance and technical support technician, systems operator, application and computer support specialist, systems administrator, network administrator, and database administrator). The skill…

  15. Text Mining in Biomedical Domain with Emphasis on Document Clustering

    PubMed Central

    2017-01-01

    Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048

  16. New Insights on co-seismic landslide clustering

    NASA Astrophysics Data System (ADS)

    Meunier, Patrick; Marc, Odin; Hovius, Niels

    2015-04-01

    Earthquake-triggered landslides tend to cluster along topographic crests while rainfall-induced landslides should occur downslope preferentially, where pore pressure induced by groundwater flows is the highest [1]. Past studies on landslide clustering are all based on the analysis of complete dataset or subdataset of landslides associated with a given event (seismic or climatic) as a whole. In this work, we document the spatial and temporal variations of the landslide position (on hillslopes) within the epicentral area of the 1994 Northridge, the 1999 Chichi, the 2004 Niigata, the 2008 Iwate and the 2008 Wenchuan earthquakes. We show that crest clustering is not systematic, non uniform in space and exhibit patterns that vary a lot from one case to another. These patterns are not easy to interpret as they don't seem to be controlled by a single governing parameter but result from a complex interaction between local (hillslope length and gradient, lithology) and seismic (distance to source, slope aspect, radiation pattern, coseismic uplift) parameters. [1] Meunier, P., Hovius, N., & Haines, J. A. (2008). Topographic site effects and the location of earthquake induced landslides. Earth and Planetary Science Letters, 275(3), 221-232

  17. Competency Index. [Health Technology Cluster.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This competency index lists the competencies included in the 62 units of the Tech Prep Competency Profiles within the Health Technologies Cluster. The unit topics are as follows: employability skills; professionalism; teamwork; computer literacy; documentation; infection control and risk management; medical terminology; anatomy, physiology, and…

  18. Extracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention

    NASA Astrophysics Data System (ADS)

    Liu, Jianquan; Chen, Hanxiong; Furuse, Kazutaka; Ohbo, Nobuo

    Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer’s intention and propose a new model to extract related words. Our approach is based on the idea that the web page designers usually make the correlative hyperlinks appear in close zone on the browser. We developed a browser-based crawler to collect “geographically” near hyperlinks, then by clustering these hyperlinks based on their pixel coordinates, we extract related words which can well reflect the designer’s intention. Experimental results show that our method can represent the intention of the web page designer in extremely high precision. Moreover, the experiments indicate that our extracting method can obtain related words in a high average precision.

  19. Transportation: Grade 8. Cluster IV.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 8, the document is devoted to the occupational cluster "Transportation." It is divided into five units: surface transportation, interstate transportation, air transportation, water transportation, and subterranean transportation (the Metro). Each unit is introduced by a statement of the topic, the unit's…

  20. Illinois Occupational Skill Standards: Information Technology Design/Build Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document contains Illinois Occupational Skill Standards for occupations in the Information Technology Design and Build Cluster (technical writer, programmer, system analyst, network architect, application product architect, network engineer, and database administrator). The skill standards define what an individual should know and the…

  1. Illinois Occupational Skill Standards: Housekeeping Management Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document contains 44 occupational skill standards for the housekeeping management occupational cluster, as required for the state of Illinois. Skill standards, which were developed by committees that included educators and representatives from business, industry, and labor, are intended to promote education and training investment and ensure…

  2. Hospitality, Recreation, and Personal Service Occupations: Grade 8. Cluster V.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 8, the document is devoted to the occupational cluster "Hospitality, Recreation, and Personal Service Occupations." It is divided into four units: recreational resources for education, employment, and professional opportunities; barbering and cosmetology; mortuary science; hotel-motel management. Each unit is…

  3. Illinois Occupational Skill Standards. Meeting Professional Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for workforce preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in the meeting professional occupational cluster. It begins with a brief overview of the Illinois perspective on occupational skill standards and credentialing,…

  4. Computer program documentation: ISOCLS iterative self-organizing clustering program, program C094

    NASA Technical Reports Server (NTRS)

    Minter, R. T. (Principal Investigator)

    1972-01-01

    The author has identified the following significant results. This program implements an algorithm which, ideally, sorts a given set of multivariate data points into similar groups or clusters. The program is intended for use in the evaluation of multispectral scanner data; however, the algorithm could be used for other data types as well. The user may specify a set of initial estimated cluster means to begin the procedure, or he may begin with the assumption that all the data belongs to one cluster. The procedure is initiatized by assigning each data point to the nearest (in absolute distance) cluster mean. If no initial cluster means were input, all of the data is assigned to cluster 1. The means and standard deviations are calculated for each cluster.

  5. Cognitive-behavioral conjoint therapy for PTSD improves various PTSD symptoms and trauma-related cognitions: Results from a randomized controlled trial.

    PubMed

    Macdonald, Alexandra; Pukay-Martin, Nicole D; Wagner, Anne C; Fredman, Steffany J; Monson, Candice M

    2016-02-01

    Numerous studies document an association between posttraumatic stress disorder (PTSD) and impairments in intimate relationship functioning, and there is evidence that PTSD symptoms and associated impairments are improved by cognitive-behavioral conjoint therapy for PTSD (CBCT for PTSD; Monson & Fredman, 2012). The present study investigated changes across treatment in clinician-rated PTSD symptom clusters and patient-rated trauma-related cognitions in a randomized controlled trial comparing CBCT for PTSD with waitlist in a sample of 40 individuals with PTSD and their partners (N = 40; Monson et al., 2012). Compared with waitlist, patients who received CBCT for PTSD immediately demonstrated greater improvements in all PTSD symptom clusters, trauma-related beliefs, and guilt cognitions (Hedge's gs -.33 to -1.51). Results suggest that CBCT for PTSD improves all PTSD symptom clusters and trauma-related cognitions among individuals with PTSD and further supports the value of utilizing a couple-based approach to the treatment of PTSD. (c) 2016 APA, all rights reserved).

  6. Development of a 12-Thrust Chamber Kerosene /Oxygen Primary Rocket Sub-System for an Early (1964) Air-Augmented Rocket Ground-Test System

    NASA Technical Reports Server (NTRS)

    Pryor, D.; Hyde, E. H.; Escher, W. J. D.

    1999-01-01

    Airbreathing/Rocket combined-cycle, and specifically rocket-based combined- cycle (RBCC), propulsion systems, typically employ an internal engine flow-path installed primary rocket subsystem. To achieve acceptably short mixing lengths in effecting the "air augmentation" process, a large rocket-exhaust/air interfacial mixing surface is needed. This leads, in some engine design concepts, to a "cluster" of small rocket units, suitably arrayed in the flowpath. To support an early (1964) subscale ground-test of a specific RBCC concept, such a 12-rocket cluster was developed by NASA's Marshall Space Flight Center (MSFC). The small primary rockets used in the cluster assembly were modified versions of an existing small kerosene/oxygen water-cooled rocket engine unit routinely tested at MSFC. Following individual thrust-chamber tests and overall subsystem qualification testing, the cluster assembly was installed at the U. S. Air Force's Arnold Engineering Development Center (AEDC) for RBCC systems testing. (The results of the special air-augmented rocket testing are not covered here.) While this project was eventually successfully completed, a number of hardware integration problems were met, leading to catastrophic thrust chamber failures. The principal "lessons learned" in conducting this early primary rocket subsystem experimental effort are documented here as a basic knowledge-base contribution for the benefit of today's RBCC research and development community.

  7. Public Service Occupations: Grade 8. Cluster I

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 8, the document is devoted to the occupational cluster "Public Service Occupations." It is divided into six units: education, public utilities, community social and health services, law enforcement agencies, fire departments, and the postal system. Each unit is introduced by a statement of the topic, the…

  8. Illinois Occupational Skill Standards: Swine Production Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document contains 52 Occupational Skill Standards for the swine production occupational cluster, as required for the state of Illinois. Skill Standards, which were developed by committees that included educators, business, industry, and labor, are intended to promote education and training investment and ensure that students and workers are…

  9. Illinois Occupational Skill Standards. Collision Repair Technician Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended as a guide for workforce preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in occupations in the (vehicle) collision repair technician cluster. It begins with a brief overview of the Illinois perspective on occupational skill standards…

  10. Construction and Environment: Grade 7. Cluster IV.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 7, the document is devoted to the occupational cluster "Construction and Environment." It is divided into four units: urban renewal and development, urban and suburban construction and planning, megalopolis, and demography. Each unit is introduced by a statement of the topic, the unit's purpose, main ideas,…

  11. Communications and Media: Grade 7. Cluster II.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 7, the document is devoted to the occupational cluster "Communications and Media." It is divided into six units: advertising, film and photography, radio and television, journalism and publishing, library and periodicals, and transocean communications. Each unit is introduced by a statement of the topic, the…

  12. The mystery of the Hawaii liver disease cluster in summer 2013: A pragmatic and clinical approach to solve the problem.

    PubMed

    Teschke, Rolf; Schwarzenboeck, Alexander; Frenzel, Christian; Schulze, Johannes; Eickhoff, Axel; Wolff, Albrecht

    2016-01-01

    In the fall of 2013, the US Centers for Disease Control and Prevention (CDC) published a preliminary report on a cluster of liver disease cases that emerged in Hawaii in the summer 2013. This report claimed a temporal association as sufficient evidence that OxyELITE Pro (OEP), a dietary supplement (DS) mainly for weight loss, was the cause of this mysterious cluster. However, the presented data were inconsistent and required a thorough reanalysis. To further investigate the cause(s) of this cluster, we critically evaluated redacted raw clinical data of the cluster patients, as the CDC report received tremendous publicity in local and nationwide newspapers and television. This attention put regulators and physicians from the medical center in Honolulu that reported the cluster, under enormous pressure to succeed, risking biased evaluations and hasty conclusions. We noted pervasive bias in the documentation, conclusions, and public statements, also poor quality of case management. Among the cases we reviewed, many causes unrelated to any DS were evident, including decompensated liver cirrhosis, acute liver failure by acetaminophen overdose, acute cholecystitis with gallstones, resolving acute hepatitis B, acute HSV and VZV hepatitis, hepatitis E suspected after consumption of wild hog meat, and hepatotoxicity by acetaminophen or ibuprofen. Causality assessments based on the updated CIOMS scale confirmed the lack of evidence for any DS including OEP as culprit for the cluster. Thus, the Hawaii liver disease cluster is now best explained by various liver diseases rather than any DS, including OEP.

  13. Revealing common disease mechanisms shared by tumors of different tissues of origin through semantic representation of genomic alterations and topic modeling.

    PubMed

    Chen, Vicky; Paisley, John; Lu, Xinghua

    2017-03-14

    Cancer is a complex disease driven by somatic genomic alterations (SGAs) that perturb signaling pathways and consequently cellular function. Identifying patterns of pathway perturbations would provide insights into common disease mechanisms shared among tumors, which is important for guiding treatment and predicting outcome. However, identifying perturbed pathways is challenging, because different tumors can have the same perturbed pathways that are perturbed by different SGAs. Here, we designed novel semantic representations that capture the functional similarity of distinct SGAs perturbing a common pathway in different tumors. Combining this representation with topic modeling would allow us to identify patterns in altered signaling pathways. We represented each gene with a vector of words describing its function, and we represented the SGAs of a tumor as a text document by pooling the words representing individual SGAs. We applied the nested hierarchical Dirichlet process (nHDP) model to a collection of tumors of 5 cancer types from TCGA. We identified topics (consisting of co-occurring words) representing the common functional themes of different SGAs. Tumors were clustered based on their topic associations, such that each cluster consists of tumors sharing common functional themes. The resulting clusters contained mixtures of cancer types, which indicates that different cancer types can share disease mechanisms. Survival analysis based on the clusters revealed significant differences in survival among the tumors of the same cancer type that were assigned to different clusters. The results indicate that applying topic modeling to semantic representations of tumors identifies patterns in the combinations of altered functional pathways in cancer.

  14. Forensic discrimination of blue ballpoint pens on documents by laser ablation inductively coupled plasma mass spectrometry and multivariate analysis.

    PubMed

    Alamilla, Francisco; Calcerrada, Matías; García-Ruiz, Carmen; Torre, Mercedes

    2013-05-10

    The differentiation of blue ballpoint pen inks written on documents through an LA-ICP-MS methodology is proposed. Small common office paper portions containing ink strokes from 21 blue pens of known origin were cut and measured without any sample preparation. In a first step, Mg, Ca and Sr were proposed as internal standards (ISs) and used in order to normalize elemental intensities and subtract background signals from the paper. Then, specific criteria were designed and employed to identify target elements (Li, V, Mn, Co, Ni, Cu, Zn, Zr, Sn, W and Pb) which resulted independent of the IS chosen in a 98% of the cases and allowed a qualitative clustering of the samples. In a second step, an elemental-related ratio (ink ratio) based on the targets previously identified was used to obtain mass independent intensities and perform pairwise comparisons by means of multivariate statistical analyses (MANOVA, Tukey's HSD and T2 Hotelling). This treatment improved the discrimination power (DP) and provided objective results, achieving a complete differentiation among different brands and a partial differentiation within pen inks from the same brands. The designed data treatment, together with the use of multivariate statistical tools, represents an easy and useful tool for differentiating among blue ballpoint pen inks, with hardly sample destruction and without the need for methodological calibrations, being its use potentially advantageous from a forensic-practice standpoint. To test the procedure, it was applied to analyze real handwritten questioned contracts, previously studied by the Department of Forensic Document Exams of the Criminalistics Service of Civil Guard (Spain). The results showed that all questioned ink entries were clustered in the same group, being those different from the remaining ink on the document. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  15. Into the complexity of coseismic landslide clustering

    NASA Astrophysics Data System (ADS)

    Meunier, Patrick; Marc, Odin; Uchida, Taro; Hovius, Niels

    2014-05-01

    Earthquake-triggered landslides tend to cluster along topographic crests while rainfall-induced landslides are more uniformly distributed on hillslopes [1]. In theory, rainfall induced landslides should even occur downslope preferentially, where pore pressure induced by groundwater flows is the highest. Past studies on landslide clustering are all based on the analysis of complete dataset or subdataset of landslides associated with a given event (seismic or climatic) as a whole. In this work, we document the spatial variation of the landslide position (on hillslopes) within the epicentral area for the cases of the 1999 Chichi, the 2004 Niigata and the 2008 Iwate earthquakes. We show that landslide clustering is not uniform in space and exhibit patterns that vary a lot from one case to another. These patterns are not easy to interpret as they don't seem to be controlled by a single governing parameter but result from a complex interaction between local (hillslope length and gradient, lithology) and seismic (distance to source, slope aspect, radiation pattern, coseismic uplift) parameters. [1] Meunier, P., Hovius, N., & Haines, J. A. (2008). Topographic site effects and the location of earthquake induced landslides. Earth and Planetary Science Letters, 275(3), 221-232.

  16. Illinois Occupational Skill Standards: Lodging Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document of skill standards for the lodging cluster serves as a guide to workforce preparation program providers in defining content for their programs and to employers to establish the skills and standards necessary for job acquisition. These 28 occupational skill standards describe what people should know and be able to do in an…

  17. Illinois Occupational Skill Standards: Greenhouse/Nursery Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document of skill standards for the greenhouse/nursery cluster serves as a guide to workforce preparation program providers in defining content for their programs and to employers to establish the skills and standards necessary for job acquisition. These 23 occupational skill standards describe what people should know and be able to do in an…

  18. Illinois Occupational Skill Standards: Machining Skills Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document of skill standards for the machining skills cluster serves as a guide to workforce preparation program providers in defining content for their programs and to employers to establish the skills and standards necessary for job acquisition. These 67 occupational skill standards describe what people should know and be able to do in an…

  19. "Clustering" Documents Automatically to Support Scoping Reviews of Research: A Case Study

    ERIC Educational Resources Information Center

    Stansfield, Claire; Thomas, James; Kavanagh, Josephine

    2013-01-01

    Background: Scoping reviews of research help determine the feasibility and the resource requirements of conducting a systematic review, and the potential to generate a description of the literature quickly is attractive. Aims: To test the utility and applicability of an automated clustering tool to describe and group research studies to improve…

  20. Construction Cluster Volume I [Wood Structural Framing].

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Justice, Harrisburg. Bureau of Correction.

    The document is the first of a series, to be integrated with a G.E.D. program, containing instructional materials at the basic skills level for the construction cluster. It focuses on wood structural framing and contains 20 units: (1) occupational information; (2) blueprint reading; (3) using leveling instruments and laying out building lines; (4)…

  1. Construction Cluster Volume II [Masonry Work].

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Justice, Harrisburg. Bureau of Correction.

    The document is the second of a series, to be integrated with a G.E.D. program, containing instructional materials at the basic skills level for the construction cluster. The volume focuses on masonry and consists of 20 instructional units which require a month of study. The units include: (1) historical aspects of masonry work and occupational…

  2. Illinois Occupational Skill Standards: Landscape Technician Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document of skill standards for the landscape technician cluster serves as a guide to workforce preparation program providers in defining content for their programs and to employers to establish the skills and standards necessary for job acquisition. These 19 occupational skill standards describe what people should know and be able to do in…

  3. Agri-Business, Natural Resources, Marine Science; Grade 7. Cluster V.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 7, the document is devoted to the occupational clusters "Agri-business, Natural Resources, and Marine Science." It is divided into five units: natural resources, ecology, landscaping, conservation, oceanography. Each unit is introduced by a statement of the topic, the unit's purpose, main ideas, quests, and a…

  4. Fine Arts and Humanities: Grade 7. Cluster III.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for Grade 7, the document is devoted to the occupational cluster "Fine Arts and Humanities." It is divided into five units: drama and literature, music, dance, art, and crafts. Each unit is introduced by a statement of the topic, the unit's purpose, main ideas, quests, and a list of career opportunities…

  5. Health Occupations: Grade 8. Cluster II.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 8, the document is devoted to the occupational cluster "Health Occupations." It is divided into four units: the hospital, preventive medicine, drug use and abuse, and alcohol and tobacco. Each unit is introduced by a statement of the topic, the unit's purpose, main ideas, quests, and a list of career…

  6. Consumer and Homemaking: Grade 7. Cluster I.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 7, the document is devoted to the occupational cluster "Consumer and Homemaking." It is divided into six units: buying, child care, nutrition, clothing, family relations, and housing and household management. Each unit is introduced by a statement of the topic, the unit's purpose, main ideas, quests, and a…

  7. Construction Cluster Volume III [Plumbing].

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Justice, Harrisburg. Bureau of Correction.

    The document is the third of a series, to be integrated with a G.E.D. program, containing instructional materials at the basic skills level for the construction cluster. The volume focuses on plumbing and consists of 20 instructional units which require a month of study. The units include: (1) importance of plumbing; (2) pipe and tubing…

  8. Construction Cluster Volume 5 [Electrical].

    ERIC Educational Resources Information Center

    Pennsylvania State Dept. of Justice, Harrisburg. Bureau of Correction.

    The document is the fifth of a series, to be integrated with a G.E.D. program, containing instructional materials for the construction cluster. The volume focuses on electrical work and consists of 20 instructional units which require a month of study: (1) safety precautions and first aid for electrical workers; (2) planning a simple installation;…

  9. Related Core Academic Knowledge and Skills. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists the industry-identified core academic knowledge and skills that should be possessed by all Georgia students who are enrolled in occupational cluster programs and are preparing to enter the work force or continue their occupational specialization at the postsecondary level. First, 63 related communications competencies are…

  10. Semi-supervised clustering methods.

    PubMed

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  11. Illinois Occupational Skill Standards: Clinical Laboratory Science/Biotechnology Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for workforce preparation program providers, details the Illinois Occupational Skill Standards for clinical laboratory occupations programs. The document begins with a brief overview of the Illinois perspective on occupational skill standards and credentialing, the process used to develop the…

  12. Tech-Prep Competency Profiles within the Business/Computer Technologies.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for educators throughout Ohio who are involved in planning and/or delivering tech prep programs within the business/computer technologies cluster, discusses and presents tech prep competency profiles (TCPs) for 12 business/computer technology occupations. The first part of the document contains the following:…

  13. Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists.

    PubMed

    Brehmer, Matthew; Ingram, Stephen; Stray, Jonathan; Munzner, Tamara

    2014-12-01

    For an investigative journalist, a large collection of documents obtained from a Freedom of Information Act request or a leak is both a blessing and a curse: such material may contain multiple newsworthy stories, but it can be difficult and time consuming to find relevant documents. Standard text search is useful, but even if the search target is known it may not be possible to formulate an effective query. In addition, summarization is an important non-search task. We present Overview, an application for the systematic analysis of large document collections based on document clustering, visualization, and tagging. This work contributes to the small set of design studies which evaluate a visualization system "in the wild", and we report on six case studies where Overview was voluntarily used by self-initiated journalists to produce published stories. We find that the frequently-used language of "exploring" a document collection is both too vague and too narrow to capture how journalists actually used our application. Our iterative process, including multiple rounds of deployment and observations of real world usage, led to a much more specific characterization of tasks. We analyze and justify the visual encoding and interaction techniques used in Overview's design with respect to our final task abstractions, and propose generalizable lessons for visualization design methodology.

  14. Semi-supervised clustering methods

    PubMed Central

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  15. NAS Requirements Checklist for Job Queuing/Scheduling Software

    NASA Technical Reports Server (NTRS)

    Jones, James Patton

    1996-01-01

    The increasing reliability of parallel systems and clusters of computers has resulted in these systems becoming more attractive for true production workloads. Today, the primary obstacle to production use of clusters of computers is the lack of a functional and robust Job Management System for parallel applications. This document provides a checklist of NAS requirements for job queuing and scheduling in order to make most efficient use of parallel systems and clusters for parallel applications. Future requirements are also identified to assist software vendors with design planning.

  16. Wrapping up BLAST and other applications for use on Unix clusters.

    PubMed

    Hokamp, Karsten; Shields, Denis C; Wolfe, Kenneth H; Caffrey, Daniel R

    2003-02-12

    We have developed two programs that speed up common bioinformatic applications by spreading them across a UNIX cluster.(1) BLAST.pm, a new module for the 'MOLLUSC' package. (2) WRAPID, a simple tool for parallelizing large numbers of small instances of programs such as BLAST, FASTA and CLUSTALW. The packages were developed in Perl on a 20-node Linux cluster and are provided together with a configuration script and documentation. They can be freely downloaded from http://wolfe.gen.tcd.ie/wrapper.

  17. A compilation of redshifts and velocity dispersions for Abell clusters (Struble and Rood 1987): Documentation for the machine-readable version

    NASA Technical Reports Server (NTRS)

    Warren, Wayne H., Jr.

    1989-01-01

    The machine readable version of the compilation, as it is currently being distributed from the Astronomical Data Center, is described. The catalog contains redshifts and velocity dispersions for all Abell clusters for which these data had been published up to 1986 July. Also included are 1950 equatorial coordinates for the centers of the listed clusters, numbers of observations used to determine the redshifts, and bibliographical references citing the data sources.

  18. Home Environments of Infants From Immigrant Families in the United States: Findings From the New Immigrant Survey

    PubMed Central

    Bradley, Robert H.; Pennar, Amy; Glick, Jennifer

    2014-01-01

    Data from the New Immigrant Survey were used to describe the home environments of 638 children ages birth to 3 years whose parents legally immigrated to the United States. Thirty-two indicators of home conditions were clustered into four domains: discipline and socioemotional in support, learning materials, enriching experiences, and family activities. Results revealed variation in how frequently infants from every country (Mexico, El Salvador, India, Philippines) and region (East Asia, Europe, Caribbean, Africa) studied experienced each home environmental condition. There were differences between countries and regions on many indicators as well as differences based on parents' level of education. The experiences documented for children of recent legal immigrants were similar to those documented for children of native-born families in other studies. PMID:25798506

  19. Mathematical description and program documentation for CLASSY, an adaptive maximum likelihood clustering method

    NASA Technical Reports Server (NTRS)

    Lennington, R. K.; Rassbach, M. E.

    1979-01-01

    Discussed in this report is the clustering algorithm CLASSY, including detailed descriptions of its general structure and mathematical background and of the various major subroutines. The report provides a development of the logic and equations used with specific reference to program variables. Some comments on timing and proposed optimization techniques are included.

  20. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Professional Nurse (Associate Degree). Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  1. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Home Health Aide. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  2. Effects of documentation-based decision support on chronic disease management.

    PubMed

    Schnipper, Jeffrey L; Linder, Jeffrey A; Palchuk, Matvey B; Yu, D Tony; McColgan, Kerry E; Volk, Lynn A; Tsurikova, Ruslana; Melnikas, Andrea J; Einbinder, Jonathan S; Middleton, Blackford

    2010-12-01

    To evaluate whether a new documentation-based clinical decision support system (CDSS) is effective in addressing deficiencies in the care of patients with coronary artery disease (CAD) and diabetes mellitus (DM). Controlled trial randomized by physician. We assigned primary care physicians (PCPs) in 10 ambulatory practices to usual care or the CAD/DM Smart Form for 9 months. The primary outcome was the proportion of deficiencies in care that were addressed within 30 days after a patient visit. The Smart Form was used for 5.6% of eligible patients. In the intention-to-treat analysis, patients of intervention PCPs had a greater proportion of deficiencies addressed within 30 days of a visit compared with controls (11.4% vs 10.1%, adjusted and clustered odds ratio =1.14; 95% confidence interval, 1.02-1.28; P = .02). Differences were more pronounced in the "on-treatment" analysis: 17.0% of deficiencies were addressed after visits in which the Smart Form was used compared with 10.6% of deficiencies after visits in which it was not used (P <.001). Measures that improved included documentation of smoking status and prescription of antiplatelet agents when appropriate. Overall use of the CAD/DM Smart Form was low, and improvements in management were modest. When used, documentation-based decision support shows promise, and future studies should focus on refining such tools, integrating them into current electronic health record platforms, and promoting their use, perhaps through organizational changes to primary care practices.

  3. Writer identification on historical Glagolitic documents

    NASA Astrophysics Data System (ADS)

    Fiel, Stefan; Hollaus, Fabian; Gau, Melanie; Sablatnig, Robert

    2013-12-01

    This work aims at automatically identifying scribes of historical Slavonic manuscripts. The quality of the ancient documents is partially degraded by faded-out ink or varying background. The writer identification method used is based on image features, which are described with Scale Invariant Feature Transform (SIFT) features. A visual vocabulary is used for the description of handwriting characteristics, whereby the features are clustered using a Gaussian Mixture Model and employing the Fisher kernel. The writer identification approach is originally designed for grayscale images of modern handwritings. But contrary to modern documents, the historical manuscripts are partially corrupted by background clutter and water stains. As a result, SIFT features are also found on the background. Since the method shows also good results on binarized images of modern handwritings, the approach was additionally applied on binarized images of the ancient writings. Experiments show that this preprocessing step leads to a significant performance increase: The identification rate on binarized images is 98.9%, compared to an identification rate of 87.6% gained on grayscale images.

  4. EEG Correlates of Ten Positive Emotions.

    PubMed

    Hu, Xin; Yu, Jianwen; Song, Mengdi; Yu, Chun; Wang, Fei; Sun, Pei; Wang, Daifa; Zhang, Dan

    2017-01-01

    Compared with the well documented neurophysiological findings on negative emotions, much less is known about positive emotions. In the present study, we explored the EEG correlates of ten different positive emotions (joy, gratitude, serenity, interest, hope, pride, amusement, inspiration, awe, and love). A group of 20 participants were invited to watch 30 short film clips with their EEGs simultaneously recorded. Distinct topographical patterns for different positive emotions were found for the correlation coefficients between the subjective ratings on the ten positive emotions per film clip and the corresponding EEG spectral powers in different frequency bands. Based on the similarities of the participants' ratings on the ten positive emotions, these emotions were further clustered into three representative clusters, as 'encouragement' for awe, gratitude, hope, inspiration, pride, 'playfulness' for amusement, joy, interest, and 'harmony' for love, serenity. Using the EEG spectral powers as features, both the binary classification on the higher and lower ratings on these positive emotions and the binary classification between the three positive emotion clusters, achieved accuracies of approximately 80% and above. To our knowledge, our study provides the first piece of evidence on the EEG correlates of different positive emotions.

  5. Cluster analysis of Pinus taiwanensis for its ex situ conservation in China.

    PubMed

    Gao, X; Shi, L; Wu, Z

    2015-06-01

    Pinus taiwanensis Hayata is one of the most famous sights in the Huangshan Scenic Resort, China, because of its strong adaptability and ability to survive; however, this endemic species is currently under threat in China. Relationships between different P. taiwanensis populations have been well-documented; however, few studies have been conducted on how to protect this rare pine. In the present study, we propose the ex situ conservation of this species using geographical information system (GIS) cluster and genetic diversity analyses. The GIS cluster method was conducted as a preliminary analysis for establishing a sampling site category based on climatic factors. Genetic diversity was analyzed using morphological and genetic traits. By combining geographical information with genetic data, we demonstrate that growing conditions, morphological traits, and the genetic make-up of the population in the Huangshan Scenic Resort were most similar to conditions on Tianmu Mountain. Therefore, we suggest that Tianmu Mountain is the best choice for the ex situ conservation of P. taiwanensis. Our results provide a molecular basis for the sustainable management, utilization, and conservation of this species in Huangshan Scenic Resort.

  6. Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

    PubMed

    Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J

    2006-01-01

    One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

  7. From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential. Research Report. ETS RR-12-25

    ERIC Educational Resources Information Center

    Sukkarieh, Jane Z.; von Davier, Matthias; Yamamoto, Kentaro

    2012-01-01

    This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students' responses;…

  8. Toward the 21st Century: Preparing Proactive Visionary Transformational Leaders for Building Learning Communities. Human Resource Development. Tampa Cluster. Winter 1994.

    ERIC Educational Resources Information Center

    Groff, Warren H.

    This document describes the Tampa Cluster human resources development (HRD) seminar that was conducted as part of Nova University's distance education program in higher education (PHE). Discussed first are HRD in the agricultural and business industrial eras and changing HRD practices/needs, Nova University's PHE and HRD program, the proceedings…

  9. Evaluation of Potential LSST Spatial Indexing Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nikolaev, S; Abdulla, G; Matzke, R

    2006-10-13

    The LSST requirement for producing alerts in near real-time, and the fact that generating an alert depends on knowing the history of light variations for a given sky position, both imply that the clustering information for all detections is available at any time during the survey. Therefore, any data structure describing clustering of detections in LSST needs to be continuously updated, even as new detections are arriving from the pipeline. We call this use case ''incremental clustering'', to reflect this continuous updating of clustering information. This document describes the evaluation results for several potential LSST incremental clustering strategies, using: (1)more » Neighbors table and zone optimization to store spatial clusters (a.k.a. Jim Grey's, or SDSS algorithm); (2) MySQL built-in R-tree implementation; (3) an external spatial index library which supports a query interface.« less

  10. The spatial distribution of gender differences in obesity prevalence differs from overall obesity prevalence among US adults

    PubMed Central

    Gartner, Danielle R.; Taber, Daniel R.; Hirsch, Jana A.; Robinson, Whitney R.

    2016-01-01

    Purpose While obesity disparities between racial and socioeconomic groups have been well characterized, those based on gender and geography have not been as thoroughly documented. This study describes obesity prevalence by state, gender, and race/ethnicity to (1) characterize obesity gender inequality, (2) determine if the geographic distribution of inequality is spatially clustered and (3) contrast the spatial clustering patterns of obesity gender inequality with overall obesity prevalence. Methods Data from the Centers for Disease Control and Prevention’s 2013 Behavioral Risk Factor Surveillance System (BRFSS) were used to calculate state-specific obesity prevalence and gender inequality measures. Global and Local Moran’s Indices were calculated to determine spatial autocorrelation. Results Age-adjusted, state-specific obesity prevalence difference and ratio measures show spatial autocorrelation (z-score=4.89, p-value <0.001). Local Moran’s Indices indicate the spatial distributions of obesity prevalence and obesity gender inequalities are not the same. High and low values of obesity prevalence and gender inequalities cluster in different areas of the U.S. Conclusion Clustering of gender inequality suggests that spatial processes operating at the state level, such as occupational or physical activity policies or social norms, are involved in the etiology of the inequality and necessitate further attention to the determinates of obesity gender inequality. PMID:27039046

  11. Current Research into Chemical and Textual Information Retrieval at the Department of Information Studies, University of Sheffield.

    ERIC Educational Resources Information Center

    Lynch, Michael F.; Willett, Peter

    1987-01-01

    Discusses research into chemical information and document retrieval systems at the University of Sheffield. Highlights include the use of cluster analysis methods for document retrieval and drug design, representation and searching of files of generic chemical structures, and the application of parallel computer hardware to information retrieval.…

  12. Categorizing document by fuzzy C-Means and K-nearest neighbors approach

    NASA Astrophysics Data System (ADS)

    Priandini, Novita; Zaman, Badrus; Purwanti, Endah

    2017-08-01

    Increasing of technology had made categorizing documents become important. It caused by increasing of number of documents itself. Managing some documents by categorizing is one of Information Retrieval application, because it involve text mining on its process. Whereas, categorization technique could be done both Fuzzy C-Means (FCM) and K-Nearest Neighbors (KNN) method. This experiment would consolidate both methods. The aim of the experiment is increasing performance of document categorize. First, FCM is in order to clustering training documents. Second, KNN is in order to categorize testing document until the output of categorization is shown. Result of the experiment is 14 testing documents retrieve relevantly to its category. Meanwhile 6 of 20 testing documents retrieve irrelevant to its category. Result of system evaluation shows that both precision and recall are 0,7.

  13. Mississippi Curriculum Framework for Business and Office and Related Technology Cluster. Office Systems Technology (CIP: 52.0401--Administrative Assistant/Secretarial). Accounting Technology (CIP: 52.0302). Medical Office Technology (CIP: 52.0404--Medical Admin. Asst./Secretarial). Microcomputer Technology (CIP: 52.0490). Court Reporting Technology (CIP: 52.0405). Paralegal Technology (CIP: Paralegal/Legal Assistant).

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for four programs in the postsecondary-level business and office cluster (office systems, accounting, medical office, and microcomputer technologies) and two programs in the legal cluster (court reporting and paralegal…

  14. Astronomical Data Center Bulletin, volume 1, number 2

    NASA Technical Reports Server (NTRS)

    Nagy, T. A.; Warren, W. H., Jr.; Mead, J. M.

    1981-01-01

    Work in progress on astronomical catalogs is presented in 16 papers. Topics cover astronomical data center operations; automatic astronomical data retrieval at GSFC; interactive computer reference search of astronomical literature 1950-1976; formatting, checking, and documenting machine-readable catalogs; interactive catalog of UV, optical, and HI data for 201 Virgo cluster galaxies; machine-readable version of the general catalog of variable stars, third edition; galactic latitude and magnitude distribution of two astronomical catalogs; the catalog of open star clusters; infrared astronomical data base and catalog of infrared observations; the Air Force geophysics laboratory; revised magnetic tape of the N30 catalog of 5,268 standard stars; positional correlation of the two-micron sky survey and Smithsonian Astrophysical Observatory catalog sources; search capabilities for the catalog of stellar identifications (CSI) 1979 version; CSI statistics: blue magnitude versus spectral type; catalogs available from the Astronomical Data Center; and status report on machine-readable astronomical catalogs.

  15. MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification.

    PubMed

    Kalyanaraman, Ananth; Cannon, William R; Latt, Benjamin; Baxter, Douglas J

    2011-11-01

    A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs. The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph. ananth@eecs.wsu.edu; william.cannon@pnnl.gov. Supplementary data are available at Bioinformatics online.

  16. Tisettanta case study: the interoperation of furniture production companies

    NASA Astrophysics Data System (ADS)

    Amarilli, Fabrizio; Spreafico, Alberto

    This chapter presents the Tisettanta case study, focusing on the definition of the possible innovations that ICT technologies can bring to the Italian wood-furniture industry. This sector is characterized by industrial clusters composed mainly of a few large companies with international brand reputations and a large base of SMEs that manufacture finished products or are specialized in the production of single components/processes (such as the Brianza cluster, where Tisettanta operates). In this particular business ecosystem, ICT technologies can bring relevant support and improvements to the supply chain process, where collaborations between enterprises are put into action through the exchange of business documents such as orders, order confirmation, bills of lading, invoices, etc. The analysis methodology adopted in the Tisettanta case study refers to the TEKNE Methodology of Change (see Chapter 2), which defines a framework for supporting firms in the adoption of the Internetworked Enterprise organizational paradigm.

  17. Mixture Model and MDSDCA for Textual Data

    NASA Astrophysics Data System (ADS)

    Allouti, Faryel; Nadif, Mohamed; Hoai An, Le Thi; Otjacques, Benoît

    E-mailing has become an essential component of cooperation in business. Consequently, the large number of messages manually produced or automatically generated can rapidly cause information overflow for users. Many research projects have examined this issue but surprisingly few have tackled the problem of the files attached to e-mails that, in many cases, contain a substantial part of the semantics of the message. This paper considers this specific topic and focuses on the problem of clustering and visualization of attached files. Relying on the multinomial mixture model, we used the Classification EM algorithm (CEM) to cluster the set of files, and MDSDCA to visualize the obtained classes of documents. Like the Multidimensional Scaling method, the aim of the MDSDCA algorithm based on the Difference of Convex functions is to optimize the stress criterion. As MDSDCA is iterative, we propose an initialization approach to avoid starting with random values. Experiments are investigated using simulations and textual data.

  18. Mississippi Curriculum Framework for Horticulture Technology Cluster (Program CIP: 01.0601--Horticulture Serv. Op. & Mgmt., Gen.) (Program CIP: 01.0605--Landscaping Op. & Mgmt.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the horticulture technology programs cluster. Presented in the introductory section are a framework of programs and courses, description of the programs, and suggested course sequences for…

  19. MPEG-7 audio-visual indexing test-bed for video retrieval

    NASA Astrophysics Data System (ADS)

    Gagnon, Langis; Foucher, Samuel; Gouaillier, Valerie; Brun, Christelle; Brousseau, Julie; Boulianne, Gilles; Osterrath, Frederic; Chapdelaine, Claude; Dutrisac, Julie; St-Onge, Francis; Champagne, Benoit; Lu, Xiaojian

    2003-12-01

    This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.

  20. User’s guide for GcClust—An R package for clustering of regional geochemical data

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David B.

    2016-04-08

    GcClust is a software package developed by the U.S. Geological Survey for statistical clustering of regional geochemical data, and similar data such as regional mineralogical data. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of the user’s guide are bundled together in R’s unit of sharable code, which is called a “package.” The user’s guide includes step-by-step instructions showing how the functions are used to cluster data and to evaluate the clustering results. These functions are demonstrated in this report using test data, which are included in the package.

  1. Five task clusters that enable efficient and effective digitization of biological collections

    PubMed Central

    Nelson, Gil; Paul, Deborah; Riccardi, Gregory; Mast, Austin R.

    2012-01-01

    Abstract This paper describes and illustrates five major clusters of related tasks (herein referred to as task clusters) that are common to efficient and effective practices in the digitization of biological specimen data and media. Examples of these clusters come from the observation of diverse digitization processes. The staff of iDigBio (The U.S. National Science Foundation’s National Resource for Advancing Digitization of Biological Collections) visited active biological and paleontological collections digitization programs for the purpose of documenting and assessing current digitization practices and tools. These observations identified five task clusters that comprise the digitization process leading up to data publication: (1) pre-digitization curation and staging, (2) specimen image capture, (3) specimen image processing, (4) electronic data capture, and (5) georeferencing locality descriptions. While not all institutions are completing each of these task clusters for each specimen, these clusters describe a composite picture of digitization of biological and paleontological specimens across the programs that were observed. We describe these clusters, three workflow patterns that dominate the implemention of these clusters, and offer a set of workflow recommendations for digitization programs. PMID:22859876

  2. The Comparison of Iranian Normative Reference Data with Five Countries ‎Across Variables in Eight Rorschach Comprehensive System (CS) Clusters

    PubMed Central

    Hosseininasab, Abufazel; Mohammadi, Mohammadreza; Jouzi, Samira; Esmaeilinasab, Maryam; Delavar, Ali

    2016-01-01

    Objective: This study aimed to provide a normative study documenting how 114 five-seven year-old non-‎patient Iranian children respond to the Rorschach test. We compared this especial sample to ‎international normative reference values for the Comprehensive System (CS).‎ Method: One hundred fourteen 5- 7- year-old non-patient Iranian children were recruited from public ‎schools. Using five child and adolescent samples from five countries, we compared Iranian ‎Normative Reference Data- based on reference means and standard deviations for each sample.‎ Results: Findings revealed that how the scores in each sample were distributed and how the samples were ‎compared across variables in eight Rorschach Comprehensive System (CS) clusters. We reported ‎all descriptive statistics such as reference mean and standard deviation for all variables.‎ Conclusion: Iranian clinicians could rely on country specific or “local norms” when assessing children. We ‎discourage Iranian clinicians to use many CS scores to make nomothetic, score-based inferences ‎about psychopathology in children and adolescents.‎ PMID:27928247

  3. Cyclic nocturnal awakening: a warning sign of a cluster bout.

    PubMed

    Martins, Isabel Pavão

    2015-04-01

    Cluster headache is an excruciating unilateral headache with autonomic symptoms whose periodic nocturnal activity, which interrupts sleep, has been attributed to a hypothalamic generator. We describe a patient with a longstanding episodic cluster headache who experienced, on two occasions, a period of nocturnal awakenings without pain or autonomic symptoms, lasting one week before the onset of a cluster bout. Awakenings occurred twice/night at the same hours of impending cluster attacks and had no apparent trigger, being unusual for this patient who had no previous sleep disturbances. Neurological examination and brain imaging were normal. This case documents two new aspects of cluster headache. It suggests that repeated nocturnal awakenings can be a warning sign of an impending cluster period, a finding that may have therapeutic implications, and also that hypothalamic activation may begin several days before trigemino-autonomic symptoms, thus behaving as a true bout generator. © International Headache Society 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  4. Evaluation of Hierarchical Clustering Algorithms for Document Datasets

    DTIC Science & Technology

    2002-06-03

    link, complete-link, and group average ( UPGMA )) and a new set of merging criteria derived from the six partitional criterion functions. Overall, we...used the single-link, complete-link, and UPGMA schemes, as well as, the various partitional criterion functions described in Section 3.1. The single-link...other (complete-link approach). The UPGMA scheme [16] (also known as group average) overcomes these problems by measuring the similarity of two clusters

  5. NAVO MSRC Navigator. Fall 2006

    DTIC Science & Technology

    2006-01-01

    UNIX Manual Pages: xdm (1x). 7. Buddenhagen, Oswald, “The KDM Handbook,” KDE Documentation, http://docs.kde.org/development/ en /kdebase/kdm/. 8... Linux Opteron cluster was recently determined through a series of simulations that employed both fixed and adaptive meshes. The fixed-mesh scalability...approximately eight in the total number of cells in the 3-D simulation. The fixed-mesh and AMR scalability results on the Linux Opteron cluster are

  6. Clustering and Recurring Anomaly Identification: Recurring Anomaly Detection System (ReADS)

    NASA Technical Reports Server (NTRS)

    McIntosh, Dawn

    2006-01-01

    This viewgraph presentation reviews the Recurring Anomaly Detection System (ReADS). The Recurring Anomaly Detection System is a tool to analyze text reports, such as aviation reports and maintenance records: (1) Text clustering algorithms group large quantities of reports and documents; Reduces human error and fatigue (2) Identifies interconnected reports; Automates the discovery of possible recurring anomalies; (3) Provides a visualization of the clusters and recurring anomalies We have illustrated our techniques on data from Shuttle and ISS discrepancy reports, as well as ASRS data. ReADS has been integrated with a secure online search

  7. A Novel Method for Mining SaaS Software Tag via Community Detection in Software Services Network

    NASA Astrophysics Data System (ADS)

    Qin, Li; Li, Bing; Pan, Wei-Feng; Peng, Tao

    The number of online software services based on SaaS paradigm is increasing. However, users usually find it hard to get the exact software services they need. At present, tags are widely used to annotate specific software services and also to facilitate the searching of them. Currently these tags are arbitrary and ambiguous since mostly of them are generated manually by service developers. This paper proposes a method for mining tags from the help documents of software services. By extracting terms from the help documents and calculating the similarity between the terms, we construct a software similarity network where nodes represent software services, edges denote the similarity relationship between software services, and the weights of the edges are the similarity degrees. The hierarchical clustering algorithm is used for community detection in this software similarity network. At the final stage, tags are mined for each of the communities and stored as ontology.

  8. Level-2 Milestone 4797: Early Users on Max, Sequoia Visualization Cluster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cupps, Kim C.

    This report documents the fact that an early user has run successfully on Max, the Sequoia visualization cluster, ASC L2 milestone 4797: Early Users on Sequoia Visualization System (Max), due December 31, 2013. The Max visualization and data analysis cluster will provide Sequoia users with compute cycles and an interactive option for data exploration and analysis. The system will be integrated in the first quarter of FY14 and the system is expected to be moved to the classified network by the second quarter of FY14. The goal of this milestone is to have early users running their visualization and datamore » analysis work on the Max cluster on the classified network.« less

  9. EEG Correlates of Ten Positive Emotions

    PubMed Central

    Hu, Xin; Yu, Jianwen; Song, Mengdi; Yu, Chun; Wang, Fei; Sun, Pei; Wang, Daifa; Zhang, Dan

    2017-01-01

    Compared with the well documented neurophysiological findings on negative emotions, much less is known about positive emotions. In the present study, we explored the EEG correlates of ten different positive emotions (joy, gratitude, serenity, interest, hope, pride, amusement, inspiration, awe, and love). A group of 20 participants were invited to watch 30 short film clips with their EEGs simultaneously recorded. Distinct topographical patterns for different positive emotions were found for the correlation coefficients between the subjective ratings on the ten positive emotions per film clip and the corresponding EEG spectral powers in different frequency bands. Based on the similarities of the participants’ ratings on the ten positive emotions, these emotions were further clustered into three representative clusters, as ‘encouragement’ for awe, gratitude, hope, inspiration, pride, ‘playfulness’ for amusement, joy, interest, and ‘harmony’ for love, serenity. Using the EEG spectral powers as features, both the binary classification on the higher and lower ratings on these positive emotions and the binary classification between the three positive emotion clusters, achieved accuracies of approximately 80% and above. To our knowledge, our study provides the first piece of evidence on the EEG correlates of different positive emotions. PMID:28184194

  10. The spatial distribution of gender differences in obesity prevalence differs from overall obesity prevalence among US adults.

    PubMed

    Gartner, Danielle R; Taber, Daniel R; Hirsch, Jana A; Robinson, Whitney R

    2016-04-01

    Although obesity disparities between racial and socioeconomic groups have been well characterized, those based on gender and geography have not been as thoroughly documented. This study describes obesity prevalence by state, gender, and race and/or ethnicity to (1) characterize obesity gender inequality, (2) determine if the geographic distribution of inequality is spatially clustered, and (3) contrast the spatial clustering patterns of obesity gender inequality with overall obesity prevalence. Data from the Centers for Disease Control and Prevention's 2013 Behavioral Risk Factor Surveillance System were used to calculate state-specific obesity prevalence and gender inequality measures. Global and local Moran's indices were calculated to determine spatial autocorrelation. Age-adjusted, state-specific obesity prevalence difference and ratio measures show spatial autocorrelation (z-score = 4.89, P-value < .001). Local Moran's indices indicate the spatial distributions of obesity prevalence and obesity gender inequalities are not the same. High and low values of obesity prevalence and gender inequalities cluster in different areas of the United States. Clustering of gender inequality suggests that spatial processes operating at the state level, such as occupational or physical activity policies or social norms, are involved in the etiology of the inequality and necessitate further attention to the determinates of obesity gender inequality. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Use of molecular testing to identify a cluster of patients with polycythemia vera in eastern Pennsylvania.

    PubMed

    Seaman, Vincent; Jumaan, Aisha; Yanni, Emad; Lewis, Brian; Neyer, Jonathan; Roda, Paul; Xu, Mingjiang; Hoffman, Ronald

    2009-02-01

    The role of the environment in the origin of polycythemia vera has not been well documented. Recently, molecular diagnostic tools have been developed to facilitate the diagnosis of polycythemia vera. A cluster of patients with polycythemia vera was suspected in three countries in eastern Pennsylvania where there have long been a concern about environment hazards. Rigorous clinical criteria and JAK2 617V>F testing were used to confirm the diagnosis of polycythemia vera in patients in this area. Participants included cases of polycythemia vera from the 2001 to 2005 state cancer registry as well as self- and physician-referred cases. A diagnosis of polycythemia vera was confirmed in 53% of 62 participants using WHO criteria, which includes JAK2 617V>F testing. A statistically significant cluster of cases (P < 0.001) was identified where the incidence of polycythemia vera was 4.3 times that of the rest of the study area. The area of the cluster contained numerous sources of hazardous material including waste-coal power plants and U.S. Environmental Protection Agency Superfund sites. The diagnosis of polycythemia vera based solely on clinical criteria is frequently erroneous, suggesting that our prior knowledge of the epidemiology of this disease might be inaccurate. The JAK2 617V>F mutational analysis provides diagnostic clarity and permitted the confirmation of a cluster of polycythemia vera cases not identified by traditional clinical and pathologic diagnostic criteria. The close proximity of this cluster to known areas of hazardous material exposure raises concern that such environmental factors might play a role in the origin of polycythemia vera.

  12. Basic limnology of fifty-one lakes in Costa Rica.

    PubMed

    Haberyan, Kurt A; Horn, Sally P; Umaña, Gerardo

    2003-03-01

    We visited 51 lakes in Costa Rica as part of a broad-based survey to document their physical and chemical characteristics and how these relate to the mode of formation and geographical distribution of the lakes. The four oxbow lakes were low in elevation and tended to be turbid, high in conductivity and CO2, but low in dissolved O2; one of these, L. Gandoca, had a hypolimnion essentially composed of sea water. These were similar to the four wetland lakes, but the latter instead had low conductivities and pH, and turbidity was often due to tannins rather than suspended sediments. The thirteen artificial lakes formed a very heterogenous group, whose features varied depending on local factors. The thirteen lakes dammed by landslides, lava flows, or lahars occurred in areas with steep slopes, and were more likely to be stratified than most other types of lakes. The eight lakes that occupy volcanic craters tended to be deep, stratified, clear, and cool; two of these, L. Hule and L. Río Cuarto, appeared to be oligomictic (tending toward meromictic). The nine glacial lakes, all located above 3440 m elevation near Cerro Chirripó, were clear, cold, dilute, and are probably polymictic. Cluster analysis resulted in three significant groups of lakes. Cluster 1 included four calcium-rich lakes (average 48 mg l-1), Cluster 2 included fourteen lakes with more Si than Ca+2 and higher Cl- than the other clusters, and Cluster 3 included the remaining thirty-three lakes that were generally less concentrated. Each cluster included lakes of various origins located in different geographical regions; these data indicate that, apart from the high-altitude glacial lakes and lakes in the Miravalles area, similarity in lake chemistry is independent of lake distribution.

  13. Text Summarization Model based on Maximum Coverage Problem and its Variant

    NASA Astrophysics Data System (ADS)

    Takamura, Hiroya; Okumura, Manabu

    We discuss text summarization in terms of maximum coverage problem and its variant. To solve the optimization problem, we applied some decoding algorithms including the ones never used in this summarization formulation, such as a greedy algorithm with performance guarantee, a randomized algorithm, and a branch-and-bound method. We conduct comparative experiments. On the basis of the experimental results, we also augment the summarization model so that it takes into account the relevance to the document cluster. Through experiments, we showed that the augmented model is at least comparable to the best-performing method of DUC'04.

  14. Using the GeoFEST Faulted Region Simulation System

    NASA Technical Reports Server (NTRS)

    Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

    2004-01-01

    GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.

  15. fluff: exploratory analysis and visualization of high-throughput sequencing data

    PubMed Central

    Georgiou, Georgios

    2016-01-01

    Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532

  16. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    PubMed

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  17. Rare nocturnal headaches.

    PubMed

    Cohen, Anna S; Kaube, Holger

    2004-06-01

    This review describes rare headaches that can occur at night or during sleep, with a focus on cluster headaches, paroxysmal hemicrania, short-lasting unilateral neuralgiform headache attacks with conjunctival injection and tearing, hypnic headache and exploding head syndrome. It is known that cluster headaches and hypnic headache are associated with rapid eye movement sleep, as illustrated by recent polysomnographic studies. Functional imaging studies have documented hypothalamic activation that is likely to be of relevance to circadian rhythms. These headache syndromes have been shown to respond to melatonin and lithium therapy, both of which have an indirect impact on the sleep-wake cycle. There is growing evidence that cluster headache and hypnic headache are chronobiological disorders.

  18. Mississippi Curriculum Framework for Machine Tool Operation/Machine Shop and Tool and Die Making Technology Cluster (Program CIP: 48.0507--Tool and Die Maker/Technologist) (Program CIP: 48.0503--Machine Shop Assistant). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the machine tool operation/machine tool and tool and die making technology programs cluster. Presented in the introductory section are a framework of courses and programs, description of the…

  19. Trajectories of emotional-behavioral difficulty and academic competence: A 6-year, person-centered, prospective study of affluent suburban adolescents.

    PubMed

    Ansary, Nadia S; McMahon, Thomas J; Luthar, Suniya S

    2017-02-01

    This longitudinal study of affluent suburban youth (N = 319) tracked from 6th to 12th grade is parsed into two segments examining prospective associations concerning emotional-behavioral difficulties and academic achievement. In Part 1 of the investigation, markers of emotional-behavioral difficulty were used to cluster participants during 6th grade. Generalized estimating equations were then used to document between-cluster differences in academic competence from 6th to 12th grade. In Part 2 of the study, indicators of academic competence were used to cluster the same students during 6th grade, and generalized estimating equations were used to document between-cluster differences in emotional-behavioral difficulty from 6th to 12th grade. The results from Part 1 indicated that patterns of emotional-behavioral difficulty during 6th grade were concurrently associated with poorer grades and classroom adjustment with some group differences in the rate of change in classroom adjustment over time. In Part 2, patterns of academic competence during 6th grade were concurrently associated with less emotional-behavioral difficulty and some group differences in the rate of change in specific forms of emotional-behavioral difficulty over time. These results suggest that the youth sampled appeared relatively well adjusted and any emotional-behavioral-achievement difficulty that was evident at the start of middle school was sustained through the end of high school.

  20. Tools for Material Design and Selection

    NASA Astrophysics Data System (ADS)

    Wehage, Kristopher

    The present thesis focuses on applications of numerical methods to create tools for material characterization, design and selection. The tools generated in this work incorporate a variety of programming concepts, from digital image analysis, geometry, optimization, and parallel programming to data-mining, databases and web design. The first portion of the thesis focuses on methods for characterizing clustering in bimodal 5083 Aluminum alloys created by cryomilling and powder metallurgy. The bimodal samples analyzed in the present work contain a mixture of a coarse grain phase, with a grain size on the order of several microns, and an ultra-fine grain phase, with a grain size on the order of 200 nm. The mixing of the two phases is not homogeneous and clustering is observed. To investigate clustering in these bimodal materials, various microstructures were created experimentally by conventional cryomilling, Hot Isostatic Pressing (HIP), Extrusion, Dual-Mode Dynamic Forging (DMDF) and a new 'Gradient' cryomilling process. Two techniques for quantitative clustering analysis are presented, formulated and implemented. The first technique, the Area Disorder function, provides a metric of the quality of coarse grain dispersion in an ultra-fine grain matrix and the second technique, the Two-Point Correlation function, provides a metric of long and short range spatial arrangements of the two phases, as well as an indication of the mean feature size in any direction. The two techniques are implemented on digital images created by Scanning Electron Microscopy (SEM) and Electron Backscatter Detection (EBSD) of the microstructures. To investigate structure--property relationships through modeling and simulation, strategies for generating synthetic microstructures are discussed and a computer program that generates randomized microstructures with desired configurations of clustering described by the Area Disorder Function is formulated and presented. In the computer program, two-dimensional microstructures are generated by Random Sequential Adsorption (RSA) of voxelized ellipses representing the coarse grain phase. A simulated annealing algorithm is used to geometrically optimize the placement of the ellipses in the model to achieve varying user-defined configurations of spatial arrangement of the coarse grains. During the simulated annealing process, the ellipses are allowed to overlap up to a specified threshold, allowing triple junctions to form in the model. Once the simulated annealing process is complete, the remaining space is populated by smaller ellipses representing the ultra-fine grain phase. Uniform random orientations are assigned to the grains. The program generates text files that can be imported in to Crystal Plasticity Finite Element Analysis Software for stress analysis. Finally, numerical methods and programming are applied to current issues in green engineering and hazard assessment. To understand hazards associated with materials and select safer alternatives, engineers and designers need access to up-to-date hazard information. However, hazard information comes from many disparate sources and aggregating, interpreting and taking action on the wealth of data is not trivial. In light of these challenges, a Framework for Automated Hazard Assessment based on the GreenScreen list translator is presented. The framework consists of a computer program that automatically extracts data from the GHS-Japan hazard database, loads the data into a machine-readable JSON format, transforms the JSON document in to a GreenScreen JSON document using the GreenScreen List Translator v1.2 and performs GreenScreen Benchmark scoring on the material. The GreenScreen JSON documents are then uploaded to a document storage system to allow human operators to search for, modify or add additional hazard information via a web interface.

  1. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob.

    PubMed

    Goeminne, Ludger J E; Gevaert, Kris; Clement, Lieven

    2018-01-16

    Label-free shotgun proteomics is routinely used to assess proteomes. However, extracting relevant information from the massive amounts of generated data remains difficult. This tutorial provides a strong foundation on analysis of quantitative proteomics data. We provide key statistical concepts that help researchers to design proteomics experiments and we showcase how to analyze quantitative proteomics data using our recent free and open-source R package MSqRob, which was developed to implement the peptide-level robust ridge regression method for relative protein quantification described by Goeminne et al. MSqRob can handle virtually any experimental proteomics design and outputs proteins ordered by statistical significance. Moreover, its graphical user interface and interactive diagnostic plots provide easy inspection and also detection of anomalies in the data and flaws in the data analysis, allowing deeper assessment of the validity of results and a critical review of the experimental design. Our tutorial discusses interactive preprocessing, data analysis and visualization of label-free MS-based quantitative proteomics experiments with simple and more complex designs. We provide well-documented scripts to run analyses in bash mode on GitHub, enabling the integration of MSqRob in automated pipelines on cluster environments (https://github.com/statOmics/MSqRob). The concepts outlined in this tutorial aid in designing better experiments and analyzing the resulting data more appropriately. The two case studies using the MSqRob graphical user interface will contribute to a wider adaptation of advanced peptide-based models, resulting in higher quality data analysis workflows and more reproducible results in the proteomics community. We also provide well-documented scripts for experienced users that aim at automating MSqRob on cluster environments. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Discovering semantic features in the literature: a foundation for building functional associations

    PubMed Central

    Chagoyen, Monica; Carmona-Saez, Pedro; Shatkay, Hagit; Carazo, Jose M; Pascual-Montano, Alberto

    2006-01-01

    Background Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. Results We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. Conclusion The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data. PMID:16438716

  3. SOTXTSTREAM: Density-based self-organizing clustering of text streams.

    PubMed

    Bryant, Avory C; Cios, Krzysztof J

    2017-01-01

    A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.

  4. Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering.

    PubMed

    Endert, A; Fiaux, P; North, C

    2012-12-01

    Visual analytic tools aim to support the cognitively demanding task of sensemaking. Their success often depends on the ability to leverage capabilities of mathematical models, visualization, and human intuition through flexible, usable, and expressive interactions. Spatially clustering data is one effective metaphor for users to explore similarity and relationships between information, adjusting the weighting of dimensions or characteristics of the dataset to observe the change in the spatial layout. Semantic interaction is an approach to user interaction in such spatializations that couples these parametric modifications of the clustering model with users' analytic operations on the data (e.g., direct document movement in the spatialization, highlighting text, search, etc.). In this paper, we present results of a user study exploring the ability of semantic interaction in a visual analytic prototype, ForceSPIRE, to support sensemaking. We found that semantic interaction captures the analytical reasoning of the user through keyword weighting, and aids the user in co-creating a spatialization based on the user's reasoning and intuition.

  5. Anonymizing and Sharing Medical Text Records

    PubMed Central

    Li, Xiao-Bai; Qin, Jialun

    2017-01-01

    Health information technology has increased accessibility of health and medical data and benefited medical research and healthcare management. However, there are rising concerns about patient privacy in sharing medical and healthcare data. A large amount of these data are in free text form. Existing techniques for privacy-preserving data sharing deal largely with structured data. Current privacy approaches for medical text data focus on detection and removal of patient identifiers from the data, which may be inadequate for protecting privacy or preserving data quality. We propose a new systematic approach to extract, cluster, and anonymize medical text records. Our approach integrates methods developed in both data privacy and health informatics fields. The key novel elements of our approach include a recursive partitioning method to cluster medical text records based on the similarity of the health and medical information and a value-enumeration method to anonymize potentially identifying information in the text data. An experimental study is conducted using real-world medical documents. The results of the experiments demonstrate the effectiveness of the proposed approach. PMID:29569650

  6. New Framework for Cross-Domain Document Classification

    DTIC Science & Technology

    2011-03-01

    classification. The following paragraphs will introduce these related works in more detail. Wang et al . attempted to improve the accuracy of text document...of using Wikipedia to develop a thesaurus [20]. Gabrilovich et al . had an approach that is more elaborate in its use of Wikipedia text [21]. The...did show a modest improvement when it is performed using the Wikipedia information. Wang et al . improved on the results of co-clustering algorithm [24

  7. MCR Container Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haas, Nicholas Q; Gillen, Robert E; Karnowski, Thomas P

    MathWorks' MATLAB is widely used in academia and industry for prototyping, data analysis, data processing, etc. Many users compile their programs using the MATLAB Compiler to run on workstations/computing clusters via the free MATLAB Compiler Runtime (MCR). The MCR facilitates the execution of code calling Application Programming Interfaces (API) functions from both base MATLAB and MATLAB toolboxes. In a Linux environment, a sizable number of third-party runtime dependencies (i.e. shared libraries) are necessary. Unfortunately, to the MTLAB community's knowledge, these dependencies are not documented, leaving system administrators and/or end-users to find/install the necessary libraries either as runtime errors resulting frommore » them missing or by inspecting the header information of Executable and Linkable Format (ELF) libraries of the MCR to determine which ones are missing from the system. To address various shortcomings, Docker Images based on Community Enterprise Operating System (CentOS) 7, a derivative of Redhat Enterprise Linux (RHEL) 7, containing recent (2015-2017) MCR releases and their dependencies were created. These images, along with a provided sample Docker Compose YAML Script, can be used to create a simulated computing cluster where MATLAB Compiler created binaries can be executed using a sample Slurm Workload Manager script.« less

  8. Cross-reference identification within a PDF document

    NASA Astrophysics Data System (ADS)

    Li, Sida; Gao, Liangcai; Tang, Zhi; Yu, Yinyan

    2015-01-01

    Cross-references, such like footnotes, endnotes, figure/table captions, references, are a common and useful type of page elements to further explain their corresponding entities in the target document. In this paper, we focus on cross-reference identification in a PDF document, and present a robust method as a case study of identifying footnotes and figure references. The proposed method first extracts footnotes and figure captions, and then matches them with their corresponding references within a document. A number of novel features within a PDF document, i.e., page layout, font information, lexical and linguistic features of cross-references, are utilized for the task. Clustering is adopted to handle the features that are stable in one document but varied in different kinds of documents so that the process of identification is adaptive with document types. In addition, this method leverages results from the matching process to provide feedback to the identification process and further improve the algorithm accuracy. The primary experiments in real document sets show that the proposed method is promising to identify cross-reference in a PDF document.

  9. Effect of an obesity best practice alert on physician documentation and referral practices.

    PubMed

    Fitzpatrick, Stephanie L; Dickins, Kirsten; Avery, Elizabeth; Ventrelle, Jennifer; Shultz, Aaron; Kishen, Ekta; Rothschild, Steven

    2017-12-01

    The Centers for Medicare & Medicaid Services Electronic Health Record Meaningful Use Incentive Program requires physicians to document body mass index (BMI) and a follow-up treatment plan for adult patients with BMI ≥ 25. To examine the effect of a best practice alert on physician documentation of obesity-related care and referrals to weight management treatment, in a cluster-randomized design, 14 primary care clinics at an academic medical center were randomized to best practice alert intervention (n = 7) or comparator (n = 7). The alert was triggered when both height and weight were entered and BMI was ≥30. Both intervention and comparator clinics could document meaningful use by selecting a nutrition education handout within the alert. Intervention clinics could also select a referral option from the list of clinic and community-based weight management programs embedded in the alert. Main outcomes were proportion of eligible patients with (1) obesity-related documentation and (2) referral. There were 26,471 total primary care encounters with 12,981 unique adult patients with BMI ≥ 30 during the 6-month study period. Documentation doubled (17 to 33%) with implementation of the alert. However, intervention clinics were not significantly more likely to refer patients to weight management than comparator clinics (2.8 vs. 1.3%, p = 0.07). Although the alert was associated with increased physician meaningful use compliance, it was not an effective strategy for improving patient access to weight management services. Further research is needed to understand system-level characteristics that influence obesity management in primary care.

  10. Building CHAOS: An Operating System for Livermore Linux Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garlick, J E; Dunlap, C M

    2003-02-21

    The Livermore Computing (LC) Linux Integration and Development Project (the Linux Project) produces and supports the Clustered High Availability Operating System (CHAOS), a cluster operating environment based on Red Hat Linux. Each CHAOS release begins with a set of requirements and ends with a formally tested, packaged, and documented release suitable for use on LC's production Linux clusters. One characteristic of CHAOS is that component software packages come from different sources under varying degrees of project control. Some are developed by the Linux Project, some are developed by other LC projects, some are external open source projects, and some aremore » commercial software packages. A challenge to the Linux Project is to adhere to release schedules and testing disciplines in a diverse, highly decentralized development environment. Communication channels are maintained for externally developed packages in order to obtain support, influence development decisions, and coordinate/understand release schedules. The Linux Project embraces open source by releasing locally developed packages under open source license, by collaborating with open source projects where mutually beneficial, and by preferring open source over proprietary software. Project members generally use open source development tools. The Linux Project requires system administrators and developers to work together to resolve problems that arise in production. This tight coupling of production and development is a key strategy for making a product that directly addresses LC's production requirements. It is another challenge to balance support and development activities in such a way that one does not overwhelm the other.« less

  11. Micro-foundation using percolation theory of the finite time singular behavior of the crash hazard rate in a class of rational expectation bubbles

    NASA Astrophysics Data System (ADS)

    Seyrich, Maximilian; Sornette, Didier

    2016-04-01

    We present a plausible micro-founded model for the previously postulated power law finite time singular form of the crash hazard rate in the Johansen-Ledoit-Sornette (JLS) model of rational expectation bubbles. The model is based on a percolation picture of the network of traders and the concept that clusters of connected traders share the same opinion. The key ingredient is the notion that a shift of position from buyer to seller of a sufficiently large group of traders can trigger a crash. This provides a formula to estimate the crash hazard rate by summation over percolation clusters above a minimum size of a power sa (with a>1) of the cluster sizes s, similarly to a generalized percolation susceptibility. The power sa of cluster sizes emerges from the super-linear dependence of group activity as a function of group size, previously documented in the literature. The crash hazard rate exhibits explosive finite time singular behaviors when the control parameter (fraction of occupied sites, or density of traders in the network) approaches the percolation threshold pc. Realistic dynamics are generated by modeling the density of traders on the percolation network by an Ornstein-Uhlenbeck process, whose memory controls the spontaneous excursion of the control parameter close to the critical region of bubble formation. Our numerical simulations recover the main stylized properties of the JLS model with intermittent explosive super-exponential bubbles interrupted by crashes.

  12. Locally Weighted Ensemble Clustering.

    PubMed

    Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang

    2018-05-01

    Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.

  13. Efficient clustering aggregation based on data fragments.

    PubMed

    Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

    2012-06-01

    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.

  14. Textured Image Segmentation

    DTIC Science & Technology

    1980-01-01

    descriminated by frequency domain features. It has been shown (201 that Fourier features provide useful information for aerial classification and for...Package for the Social. Sciences (SPSS). These descriminant algorithms are documented in Appendix C. Source textures are known, so that cluster

  15. A Randomized Controlled Trial of Short and Standard-Length Consent Forms for a Genetic Cohort Study: Is Longer Better?

    PubMed Central

    Matsui, Kenji; Lie, Reidar K.; Turin, Tanvir C.; Kita, Yoshikuni

    2012-01-01

    Background Although the amount of detail in informed consent documents has increased over time and the documents have therefore become very long, there is little research on whether longer informed consent documents actually result in (1) better informed research subjects or (2) higher consent rates. We therefore conducted an add-on randomized controlled trial to the Takashima Study, a prospective Japanese population-based genetic cohort study, to test the hypothesis that a shorter informed consent form would satisfy both of the above goals. Methods Standard (10 459 words, 11 pages) and short (3602 words, 5 pages) consent forms in Japanese were developed and distributed using cluster-randomization to 293 potential cohort subjects living in 9 medico-social units and 288 subjects in 8 medico-social units, respectively. Results Few differences were found between the 2 groups with regard to outcome measures, including participants’ self-perceived understanding, recall of information, concerns, voluntariness, trust, satisfaction, sense of duty, and consent rates. Conclusions A short informed consent form was no less valid than a standard form with regard to fulfilling ethical requirements and securing the scientific validity of research. PMID:22447213

  16. On application of image analysis and natural language processing for music search

    NASA Astrophysics Data System (ADS)

    Gwardys, Grzegorz

    2013-10-01

    In this paper, I investigate a problem of finding most similar music tracks using, popular in Natural Language Processing, techniques like: TF-IDF and LDA. I de ned document as music track. Each music track is transformed to spectrogram, thanks that, I can use well known techniques to get words from images. I used SURF operation to detect characteristic points and novel approach for their description. The standard kmeans was used for clusterization. Clusterization is here identical with dictionary making, so after that I can transform spectrograms to text documents and perform TF-IDF and LDA. At the final, I can make a query in an obtained vector space. The research was done on 16 music tracks for training and 336 for testing, that are splitted in four categories: Hiphop, Jazz, Metal and Pop. Although used technique is completely unsupervised, results are satisfactory and encouraging to further research.

  17. A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

    PubMed Central

    Kijas, James W.; Townley, David; Dalrymple, Brian P.; Heaton, Michael P.; Maddox, Jillian F.; McGrath, Annette; Wilson, Peter; Ingersoll, Roxann G.; McCulloch, Russell; McWilliam, Sean; Tang, Dave; McEwan, John; Cockett, Noelle; Oddy, V. Hutton; Nicholas, Frank W.; Raadsma, Herman

    2009-01-01

    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability. PMID:19270757

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hadgu, Teklu; Appel, Gordon John

    Sandia National Laboratories (SNL) continued evaluation of total system performance assessment (TSPA) computing systems for the previously considered Yucca Mountain Project (YMP). This was done to maintain the operational readiness of the computing infrastructure (computer hardware and software) and knowledge capability for total system performance assessment (TSPA) type analysis, as directed by the National Nuclear Security Administration (NNSA), DOE 2010. This work is a continuation of the ongoing readiness evaluation reported in Lee and Hadgu (2014) and Hadgu et al. (2015). The TSPA computing hardware (CL2014) and storage system described in Hadgu et al. (2015) were used for the currentmore » analysis. One floating license of GoldSim with Versions 9.60.300, 10.5 and 11.1.6 was installed on the cluster head node, and its distributed processing capability was mapped on the cluster processors. Other supporting software were tested and installed to support the TSPA-type analysis on the server cluster. The current tasks included verification of the TSPA-LA uncertainty and sensitivity analyses, and preliminary upgrade of the TSPA-LA from Version 9.60.300 to the latest version 11.1. All the TSPA-LA uncertainty and sensitivity analyses modeling cases were successfully tested and verified for the model reproducibility on the upgraded 2014 server cluster (CL2014). The uncertainty and sensitivity analyses used TSPA-LA modeling cases output generated in FY15 based on GoldSim Version 9.60.300 documented in Hadgu et al. (2015). The model upgrade task successfully converted the Nominal Modeling case to GoldSim Version 11.1. Upgrade of the remaining of the modeling cases and distributed processing tasks will continue. The 2014 server cluster and supporting software systems are fully operational to support TSPA-LA type analysis.« less

  19. ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.

    PubMed

    Giancarlo, R; Scaturro, D; Utro, F

    2015-02-01

    The prediction of the number of clusters in a dataset, in particular microarrays, is a fundamental task in biological data analysis, usually performed via validation measures. Unfortunately, it has received very little attention and in fact there is a growing need for software tools/libraries dedicated to it. Here we present ValWorkBench, a software library consisting of eleven well known validation measures, together with novel heuristic approximations for some of them. The main objective of this paper is to provide the interested researcher with the full software documentation of an open source cluster validation platform having the main features of being easily extendible in a homogeneous way and of offering software components that can be readily re-used. Consequently, the focus of the presentation is on the architecture of the library, since it provides an essential map that can be used to access the full software documentation, which is available at the supplementary material website [1]. The mentioned main features of ValWorkBench are also discussed and exemplified, with emphasis on software abstraction design and re-usability. A comparison with existing cluster validation software libraries, mainly in terms of the mentioned features, is also offered. It suggests that ValWorkBench is a much needed contribution to the microarray software development/algorithm engineering community. For completeness, it is important to mention that previous accurate algorithmic experimental analysis of the relative merits of each of the implemented measures [19,23,25], carried out specifically on microarray data, gives useful insights on the effectiveness of ValWorkBench for cluster validation to researchers in the microarray community interested in its use for the mentioned task. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. Exploring Careers.

    ERIC Educational Resources Information Center

    Bureau of Labor Statistics (DOL), Washington, DC.

    This document contains a career education resource guide for junior high school students which is designed to build career awareness by means of occupational narratives, evaluative questions, activities, and career games. The information is presented in the following fourteen occupational clusters: industrial production occupations; office…

  1. Tech-Prep Competency Profiles within the Health Technologies Cluster.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document contains competency profiles for Ohio tech prep courses in the following 12 health technologies occupations: radiographer, respiratory care therapist, occupational therapy assistant, physical therapist assistant, registered nurse (associate degree), pharmacy technologist, medical laboratory technician, histotechnologist, emergency…

  2. Cluster-based query expansion using external collections in medical information retrieval.

    PubMed

    Oh, Heung-Seon; Jung, Yuchul

    2015-12-01

    Utilizing external collections to improve retrieval performance is challenging research because various test collections are created for different purposes. Improving medical information retrieval has also gained much attention as various types of medical documents have become available to researchers ever since they started storing them in machine processable formats. In this paper, we propose an effective method of utilizing external collections based on the pseudo relevance feedback approach. Our method incorporates the structure of external collections in estimating individual components in the final feedback model. Extensive experiments on three medical collections (TREC CDS, CLEF eHealth, and OHSUMED) were performed, and the results were compared with a representative expansion approach utilizing the external collections to show the superiority of our method. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. The properties of small Ag clusters bound to DNA bases.

    PubMed

    Soto-Verdugo, Víctor; Metiu, Horia; Gwinn, Elisabeth

    2010-05-21

    We study the binding of neutral silver clusters, Ag(n) (n=1-6), to the DNA bases adenine (A), cytosine (C), guanine (G), and thymine (T) and the absorption spectra of the silver cluster-base complexes. Using density functional theory (DFT), we find that the clusters prefer to bind to the doubly bonded ring nitrogens and that binding to T is generally much weaker than to C, G, and A. Ag(3) and Ag(4) make the stronger bonds. Bader charge analysis indicates a mild electron transfer from the base to the clusters for all bases, except T. The donor bases (C, G, and A) bind to the sites on the cluster where the lowest unoccupied molecular orbital has a pronounced protrusion. The site where cluster binds to the base is controlled by the shape of the higher occupied states of the base. Time-dependent DFT calculations show that different base-cluster isomers may have very different absorption spectra. In particular, we find new excitations in base-cluster molecules, at energies well below those of the isolated components, and with strengths that depend strongly on the orientations of planar clusters with respect to the base planes. Our results suggest that geometric constraints on binding, imposed by designed DNA structures, may be a feasible route to engineering the selection of specific cluster-base assemblies.

  4. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  5. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  6. Documentation Resources on the ESIP Wiki

    NASA Technical Reports Server (NTRS)

    Habermann, Ted; Kozimor, John; Gordon, Sean

    2017-01-01

    The ESIP community includes data providers and users that communicate with one another through datasets and metadata that describe them. Improving this communication depends on consistent high-quality metadata. The ESIP Documentation Cluster and the wiki play an important central role in facilitating this communication. We will describe and demonstrate sections of the wiki that provide information about metadata concept definitions, metadata recommendation, metadata dialects, and guidance pages. We will also describe and demonstrate the ISO Explorer, a tool that the community is developing to help metadata creators.

  7. Oxidized Guanine Base Lesions Function in 8-Oxoguanine DNA Glycosylase-1-mediated Epigenetic Regulation of Nuclear Factor κB-driven Gene Expression*

    PubMed Central

    Pan, Lang; Hao, Wenjing; Ba, Xueqing

    2016-01-01

    A large percentage of redox-responsive gene promoters contain evolutionarily conserved guanine-rich clusters; guanines are the bases most susceptible to oxidative modification(s). Consequently, 7,8-dihydro-8-oxoguanine (8-oxoG) is one of the most abundant base lesions in promoters and is primarily repaired via the 8-oxoguanine DNA glycosylase-1 (OOG1)-initiated base excision repair pathway. In view of a prompt cellular response to oxidative challenge, we hypothesized that the 8-oxoG lesion and the cognate repair protein OGG1 are utilized in transcriptional gene activation. Here, we document TNFα-induced enrichment of both 8-oxoG and OGG1 in promoters of pro-inflammatory genes, which precedes interaction of NF-κB with its DNA-binding motif. OGG1 bound to 8-oxoG upstream from the NF-κB motif increased its DNA occupancy by promoting an on-rate of both homodimeric and heterodimeric forms of NF-κB. OGG1 depletion decreased both NF-κB binding and gene expression, whereas Nei-like glycosylase-1 and -2 had a marginal effect. These results are the first to document a novel paradigm wherein the DNA repair protein OGG1 bound to its substrate is coupled to DNA occupancy of NF-κB and functions in epigenetic regulation of gene expression. PMID:27756845

  8. Engineering Technologies. State Competency Profile.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document contains 397 competencies, grouped into 58 units, for tech prep programs in the engineering technologies cluster. The competencies were developed through collaboration of Ohio business, industry, and labor representatives and secondary and associate degree educators. The competencies are rated either "essential" (necessary…

  9. Geospatial characteristics of measles transmission in China during 2005−2014

    PubMed Central

    Wen, Liang; Li, Shen-Long; Chen, Kai; Zhang, Wen-Yi

    2017-01-01

    Measles is a highly contagious and severe disease. Despite mass vaccination, it remains a leading cause of death in children in developing regions, killing 114,900 globally in 2014. In 2006, China committed to eliminating measles by 2012; to this end, the country enhanced its mandatory vaccination programs and achieved vaccination rates reported above 95% by 2008. However, in spite of these efforts, during the last 3 years (2013–2015) China documented 27,695, 52,656, and 42,874 confirmed measles cases. How measles manages to spread in China—the world’s largest population—in the mass vaccination era remains poorly understood. To address this conundrum and provide insights for future public health efforts, we analyze the geospatial pattern of measles transmission across China during 2005–2014. We map measles incidence and incidence rates for each of the 344 cities in mainland China, identify the key socioeconomic and demographic features associated with high disease burden, and identify transmission clusters based on the synchrony of outbreak cycles. Using hierarchical cluster analysis, we identify 21 epidemic clusters, of which 12 were cross-regional. The cross-regional clusters included more underdeveloped cities with large numbers of emigrants than would be expected by chance (p = 0.011; bootstrap sampling), indicating that cities in these clusters were likely linked by internal worker migration in response to uneven economic development. In contrast, cities in regional clusters were more likely to have high rates of minorities and high natural growth rates than would be expected by chance (p = 0.074; bootstrap sampling). Our findings suggest that multiple highly connected foci of measles transmission coexist in China and that migrant workers likely facilitate the transmission of measles across regions. This complex connection renders eradication of measles challenging in China despite its high overall vaccination coverage. Future immunization programs should therefore target these transmission foci simultaneously. PMID:28376097

  10. Spatial analysis of pulmonary tuberculosis in Antananarivo Madagascar: tuberculosis-related knowledge, attitude and practice.

    PubMed

    Rakotosamimanana, Sitraka; Mandrosovololona, Vatsiharizandry; Rakotonirina, Julio; Ramamonjisoa, Joselyne; Ranjalahy, Justin Rasolofomanana; Randremanana, Rindra Vatosoa; Rakotomanana, Fanjasoa

    2014-01-01

    Tuberculosis infection may remain latent, but the disease is nevertheless a serious public health issue. Various epidemiological studies on pulmonary tuberculosis have considered the spatial component and taken it into account, revealing the tendency of this disease to cluster in particular locations. The aim was to assess the contribution of Knowledge Attitude and Practice (KAP) to the distribution of tuberculosis and to provide information for the improvement of the National Tuberculosis Program. We investigated the role of KAP to distribution patterns of pulmonary tuberculosis in Antananarivo. First, we performed spatial scanning of tuberculosis aggregation among permanent cases resident in Antananarivo Urban Township using the Kulldorff method, and then we carried out a quantitative study on KAP, involving TB patients. The KAP study in the population was based on qualitative methods with focus groups. The disease still clusters in the same districts identified in the previous study. The principal cluster covered 22 neighborhoods. Most of them are part of the first district. A secondary cluster was found, involving 18 neighborhoods in the sixth district and two neighborhoods in the fifth. The relative risk was respectively 1.7 (p<10-6) in the principal cluster and 1.6 (p<10-3) in the secondary cluster. Our study showed that more was known about TB symptoms than about the duration of the disease or free treatment. Knowledge about TB was limited to that acquired at school or from relatives with TB. The attitude and practices of patients and the population in general indicated that there is still a stigma attached to tuberculosis. This type of survey can be conducted in remote zones where the tuberculosis-related KAP of the TB patients and the general population is less known or not documented; the findings could be used to adapt control measures to the local particularities.

  11. Geospatial characteristics of measles transmission in China during 2005-2014.

    PubMed

    Yang, Wan; Wen, Liang; Li, Shen-Long; Chen, Kai; Zhang, Wen-Yi; Shaman, Jeffrey

    2017-04-01

    Measles is a highly contagious and severe disease. Despite mass vaccination, it remains a leading cause of death in children in developing regions, killing 114,900 globally in 2014. In 2006, China committed to eliminating measles by 2012; to this end, the country enhanced its mandatory vaccination programs and achieved vaccination rates reported above 95% by 2008. However, in spite of these efforts, during the last 3 years (2013-2015) China documented 27,695, 52,656, and 42,874 confirmed measles cases. How measles manages to spread in China-the world's largest population-in the mass vaccination era remains poorly understood. To address this conundrum and provide insights for future public health efforts, we analyze the geospatial pattern of measles transmission across China during 2005-2014. We map measles incidence and incidence rates for each of the 344 cities in mainland China, identify the key socioeconomic and demographic features associated with high disease burden, and identify transmission clusters based on the synchrony of outbreak cycles. Using hierarchical cluster analysis, we identify 21 epidemic clusters, of which 12 were cross-regional. The cross-regional clusters included more underdeveloped cities with large numbers of emigrants than would be expected by chance (p = 0.011; bootstrap sampling), indicating that cities in these clusters were likely linked by internal worker migration in response to uneven economic development. In contrast, cities in regional clusters were more likely to have high rates of minorities and high natural growth rates than would be expected by chance (p = 0.074; bootstrap sampling). Our findings suggest that multiple highly connected foci of measles transmission coexist in China and that migrant workers likely facilitate the transmission of measles across regions. This complex connection renders eradication of measles challenging in China despite its high overall vaccination coverage. Future immunization programs should therefore target these transmission foci simultaneously.

  12. A metric to search for relevant words

    NASA Astrophysics Data System (ADS)

    Zhou, Hongding; Slater, Gary W.

    2003-11-01

    We propose a new metric to evaluate and rank the relevance of words in a text. The method uses the density fluctuations of a word to compute an index that measures its degree of clustering. Highly significant words tend to form clusters, while common words are essentially uniformly spread in a text. If a word is not rare, the metric is stable when we move any individual occurrence of this word in the text. Furthermore, we prove that the metric always increases when words are moved to form larger clusters, or when several independent documents are merged. Using the Holy Bible as an example, we show that our approach reduces the significance of common words when compared to a recently proposed statistical metric.

  13. Probabilistic topic modeling for the analysis and classification of genomic sequences

    PubMed Central

    2015-01-01

    Background Studies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques. Methods The presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences. Results and conclusions We performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased. PMID:25916734

  14. A semantic graph-based approach to biomedical summarisation.

    PubMed

    Plaza, Laura; Díaz, Alberto; Gervás, Pablo

    2011-09-01

    Access to the vast body of research literature that is available in biomedicine and related fields may be improved by automatic summarisation. This paper presents a method for summarising biomedical scientific literature that takes into consideration the characteristics of the domain and the type of documents. To address the problem of identifying salient sentences in biomedical texts, concepts and relations derived from the Unified Medical Language System (UMLS) are arranged to construct a semantic graph that represents the document. A degree-based clustering algorithm is then used to identify different themes or topics within the text. Different heuristics for sentence selection, intended to generate different types of summaries, are tested. A real document case is drawn up to illustrate how the method works. A large-scale evaluation is performed using the recall-oriented understudy for gisting-evaluation (ROUGE) metrics. The results are compared with those achieved by three well-known summarisers (two research prototypes and a commercial application) and two baselines. Our method significantly outperforms all summarisers and baselines. The best of our heuristics achieves an improvement in performance of almost 7.7 percentage units in the ROUGE-1 score over the LexRank summariser (0.7862 versus 0.7302). A qualitative analysis of the summaries also shows that our method succeeds in identifying sentences that cover the main topic of the document and also considers other secondary or "satellite" information that might be relevant to the user. The method proposed is proved to be an efficient approach to biomedical literature summarisation, which confirms that the use of concepts rather than terms can be very useful in automatic summarisation, especially when dealing with highly specialised domains. Copyright © 2011 Elsevier B.V. All rights reserved.

  15. Interactive Media Technologies. State Competency Profile.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document contains 143 competencies, grouped into 25 units, for tech prep programs in the interactive media technologies cluster. The competencies were developed through collaboration of Ohio business, industry, and labor representatives and secondary and associate degree educators. The competencies are rated either "essential"…

  16. Business/Computer Technologies. State Competency Profile.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document contains 272 competencies, grouped into 36 units, for tech prep programs in the business/computer technology cluster. The competencies were developed through collaboration of Ohio business, industry, and labor representatives and secondary and associate degree educators. The competencies are rated either "essential"…

  17. Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

    PubMed

    Saeed, Faisal; Salim, Naomie; Abdo, Ammar

    2013-07-01

    Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    PubMed Central

    Ying Wah, Teh

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

  19. A fast density-based clustering algorithm for real-time Internet of Things stream.

    PubMed

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  20. Tumor-initiating cells of breast and prostate origin show alterations in the expression of genes related to iron metabolism

    PubMed Central

    Tomkova, Veronika; Korenkova, Vlasta; Langerova, Lucie; Simonova, Ekaterina; Zjablovskaja, Polina; Alberich-Jorda, Meritxell; Neuzil, Jiri; Truksa, Jaroslav

    2017-01-01

    The importance of iron in the growth and progression of tumors has been widely documented. In this report, we show that tumor-initiating cells (TICs), represented by spheres derived from the MCF7 cell line, exhibit higher intracellular labile iron pool, mitochondrial iron accumulation and are more susceptible to iron chelation. TICs also show activation of the IRP/IRE system, leading to higher iron uptake and decrease in iron storage, suggesting that level of properly assembled cytosolic iron-sulfur clusters (FeS) is reduced. This finding is confirmed by lower enzymatic activity of aconitase and FeS cluster biogenesis enzymes, as well as lower levels of reduced glutathione, implying reduced FeS clusters synthesis/utilization in TICs. Importantly, we have identified specific gene signature related to iron metabolism consisting of genes regulating iron uptake, mitochondrial FeS cluster biogenesis and hypoxic response (ABCB10, ACO1, CYBRD1, EPAS1, GLRX5, HEPH, HFE, IREB2, QSOX1 and TFRC). Principal component analysis based on this signature is able to distinguish TICs from cancer cells in vitro and also Leukemia-initiating cells (LICs) from non-LICs in the mouse model of acute promyelocytic leukemia (APL). Majority of the described changes were also recapitulated in an alternative model represented by MCF7 cells resistant to tamoxifen (TAMR) that exhibit features of TICs. Our findings point to the critical importance of redox balance and iron metabolism-related genes and proteins in the context of cancer and TICs that could be potentially used for cancer diagnostics or therapy. PMID:28031527

  1. Eliciting end-user expectations to guide the implementation process of a new electronic health record: A case study using concept mapping.

    PubMed

    Joukes, Erik; Cornet, Ronald; de Bruijne, Martine C; de Keizer, Nicolette F

    2016-03-01

    To evaluate the usability of concept mapping to elicit the expectations of healthcare professionals regarding the implementation of a new electronic health record (EHR). These expectations need to be taken into account during the implementation process to maximize the chance of success of the EHR. Two university hospitals in Amsterdam, The Netherlands, in the preparation phase of jointly implementing a new EHR. During this study the hospitals had different methods of documenting patient information (legacy EHR vs. paper-based records). Concept mapping was used to determine and classify the expectations of healthcare professionals regarding the implementation of a new EHR. A multidisciplinary group of 46 healthcare professionals from both university hospitals participated in this study. Expectations were elicited in focus groups, their relevance and feasibility were assessed through a web-questionnaire. Nonmetric multidimensional scaling and clustering methods were used to identify clusters of expectations. We found nine clusters of expectations, each covering an important topic to enable the healthcare professionals to work properly with the new EHR once implemented: usability, data use and reuse, facility conditions, data registration, support, training, internal communication, patients, and collaboration. Average importance and feasibility of each of the clusters was high. Concept mapping is an effective method to find topics that, according to healthcare professionals, are important to consider during the implementation of a new EHR. The method helps to combine the input of a large group of stakeholders at limited efforts. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. A Weight-Adaptive Laplacian Embedding for Graph-Based Clustering.

    PubMed

    Cheng, De; Nie, Feiping; Sun, Jiande; Gong, Yihong

    2017-07-01

    Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.

  3. Population-based versus practice-based recall for childhood immunizations: a randomized controlled comparative effectiveness trial.

    PubMed

    Kempe, Allison; Saville, Alison; Dickinson, L Miriam; Eisert, Sheri; Reynolds, Joni; Herrero, Diana; Beaty, Brenda; Albright, Karen; Dibert, Eva; Koehler, Vicky; Lockhart, Steven; Calonge, Ned

    2013-06-01

    We compared the effectiveness and cost-effectiveness of population-based recall (Pop-recall) versus practice-based recall (PCP-recall) at increasing immunizations among preschool children. This cluster-randomized trial involved children aged 19 to 35 months needing immunizations in 8 rural and 6 urban Colorado counties. In Pop-recall counties, recall was conducted centrally using the Colorado Immunization Information System (CIIS). In PCP-recall counties, practices were invited to attend webinar training using CIIS and offered financial support for mailings. The percentage of up-to-date (UTD) and vaccine documentation were compared 6 months after recall. A mixed-effects model assessed the association between intervention and whether a child became UTD. Ten of 195 practices (5%) implemented recall in PCP-recall counties. Among children needing immunizations, 18.7% became UTD in Pop-recall versus 12.8% in PCP-recall counties (P < .001); 31.8% had documented receipt of 1 or more vaccines in Pop-recall versus 22.6% in PCP-recall counties (P < .001). Relative risk estimates from multivariable modeling were 1.23 (95% confidence interval [CI] = 1.10, 1.37) for becoming UTD and 1.26 (95% CI = 1.15, 1.38) for receipt of any vaccine. Costs for Pop-recall versus PCP-recall were $215 versus $1981 per practice and $17 versus $62 per child brought UTD. Population-based recall conducted centrally was more effective and cost-effective at increasing immunization rates in preschool children.

  4. Hemolivia and hepatozoon: haemogregarines with tangled evolutionary relationships.

    PubMed

    Kvičerová, Jana; Hypša, Václav; Dvořáková, Nela; Mikulíček, Peter; Jandzik, David; Gardner, Michael George; Javanbakht, Hossein; Tiar, Ghoulem; Siroký, Pavel

    2014-09-01

    The generic name Hemolivia has been used for haemogregarines characterized by morphological and biological features. The few molecular studies, focused on other haemogregarine genera but involving Hemolivia samples, indicated its close relationship to the genus Hepatozoon. Here we analyze molecular data for Hemolivia from a broad geographic area and host spectrum and provide detailed morphological documentation of the included samples. Based on molecular analyses in context of other haemogregarines, we demonstrate that several sequences deposited in GenBank from isolates described as Hepatozoon belong to the Hemolivia cluster. This illustrates the overall difficulty with recognizing Hemolivia and Hepatozoon without sufficient morphological and molecular information. The close proximity of both genera is also reflected in uncertainty about their precise phylogeny when using 18S rDNA. They cluster with almost identical likelihood either as two sister taxa or as monophyletic Hemolivia within paraphyletic Hepatozoon. However, regardless of these difficulties, the results presented here provide a reliable background for the unequivocal placement of new samples into the Hemolivia/ Hepatozoon complex. Copyright © 2014 Elsevier GmbH. All rights reserved.

  5. Structuring Communication Relationships for Interprofessional Teamwork (SCRIPT): a cluster randomized controlled trial.

    PubMed

    Zwarenstein, Merrick; Reeves, Scott; Russell, Ann; Kenaszchuk, Chris; Conn, Lesley Gotlib; Miller, Karen-Lee; Lingard, Lorelei; Thorpe, Kevin E

    2007-09-18

    Despite a burgeoning interest in using interprofessional approaches to promote effective collaboration in health care, systematic reviews find scant evidence of benefit. This protocol describes the first cluster randomized controlled trial (RCT) to design and evaluate an intervention intended to improve interprofessional collaborative communication and patient-centred care. The objective is to evaluate the effects of a four-component, hospital-based staff communication protocol designed to promote collaborative communication between healthcare professionals and enhance patient-centred care. The study is a multi-centre mixed-methods cluster randomized controlled trial involving twenty clinical teaching teams (CTTs) in general internal medicine (GIM) divisions of five Toronto tertiary-care hospitals. CTTs will be randomly assigned either to receive an intervention designed to improve interprofessional collaborative communication, or to continue usual communication practices. Non-participant naturalistic observation, shadowing, and semi-structured, qualitative interviews were conducted to explore existing patterns of interprofessional collaboration in the CTTs, and to support intervention development. Interviews and shadowing will continue during intervention delivery in order to document interactions between the intervention settings and adopters, and changes in interprofessional communication. The primary outcome is the rate of unplanned hospital readmission. Secondary outcomes are length of stay (LOS); adherence to evidence-based prescription drug therapy; patients' satisfaction with care; self-report surveys of CTT staff perceptions of interprofessional collaboration; and frequency of calls to paging devices. Outcomes will be compared on an intention-to-treat basis using adjustment methods appropriate for data from a cluster randomized design. Pre-intervention qualitative analysis revealed that a substantial amount of interprofessional interaction lacks key core elements of collaborative communication such as self-introduction, description of professional role, and solicitation of other professional perspectives. Incorporating these findings, a four-component intervention was designed with a goal of creating a culture of communication in which the fundamentals of collaboration become a routine part of interprofessional interactions during unstructured work periods on GIM wards. Registered with National Institutes of Health as NCT00466297.

  6. The Role of Bi3+ in Promoting and Stabilizing Iron Oxo Clusters in Strong Acid.

    PubMed

    Sadeghi, Omid; Amiri, Mehran; Reinheimer, Eric W; Nyman, May

    2018-05-22

    Metal oxo clusters and metal oxides assemble and precipitate from water in processes that depend on pH, temperature, and concentration. Other parameters that influence the structure, composition, and nuclearity of "molecular" and bulk metal oxides are poorly understood, and have thus not been exploited. Herein, we show that Bi 3+ drives the formation of aqueous Fe 3+ clusters, usurping the role of pH. We isolated and structurally characterized a Bi/Fe cluster, Fe 3 BiO 2 (CCl 3 COO) 8 (THF)(H 2 O) 2 , and demonstrated its conversion into an iron Keggin ion capped by six Bi 3+ irons (Bi 6 Fe 13 ). The reaction pathway was documented by X-ray scattering and mass spectrometry. Opposing the expected trend, increased cluster nuclearity required a pH decrease instead of a pH increase. We attribute this anomalous behavior of Bi/Fe(aq) solutions to Bi 3+ , which drives hydrolysis and condensation. Likewise, Bi 3+ stabilizes metal oxo clusters and metal oxides in strongly acidic conditions, which is important in applications such as water oxidation for energy storage. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Synonym-Based Word Frequency Analysis to Support the Development and Presentation of a Public Health Quality Improvement Taxonomy.

    PubMed

    Pina, Jamie; Massoudi, Barbara L; Chester, Kelley; Koyanagi, Mark

    2018-06-07

    Researchers and analysts have not completely examined word frequency analysis as an approach to creating a public health quality improvement taxonomy. To develop a taxonomy of public health quality improvement concepts for an online exchange of quality improvement work. We analyzed documents, conducted an expert review, and employed a user-centered design along with a faceted search approach to make online entries searchable for users. To provide the most targeted facets to users, we used word frequency to analyze 334 published public health quality improvement documents to find the most common clusters of word meanings. We then reviewed the highest-weighted concepts and categorized their relationships to quality improvement details in our taxonomy. Next, we mapped meanings to items in our taxonomy and presented them in order of their weighted percentages in the data. Using these methods, we developed and sorted concepts in the faceted search presentation so that online exchange users could access relevant search criteria. We reviewed 50 of the top synonym clusters and identified 12 categories for our taxonomy data. The final categories were as follows: Summary; Planning and Execution Details; Health Impact; Training and Preparation; Information About the Community; Information About the Health Department; Results; Quality Improvement (QI) Staff; Information; Accreditation Details; Collaborations; and Contact Information of the Submitter. Feedback about the elements in the taxonomy and presentation of elements in our search environment from users has been positive. When relevant data are available, the word frequency analysis method may be useful in other taxonomy development efforts for public health.

  8. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  9. Mississippi Curriculum Framework for Computer Discovery (8th Grade). CIP: 00.0252.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for technology educators in Mississippi, outlines a modular instruction approach that allows eighth graders to experience various workplace technologies within four career cluster areas: agriculture/natural resources technology, business/marketing technology, health/human services technology, and…

  10. Industrial Education. Vocational Education Program Courses Standards.

    ERIC Educational Resources Information Center

    Florida State Dept. of Education, Tallahassee. Div. of Applied Tech., Adult, and Community Education.

    This document contains vocational education program course standards for exploratory courses, practical arts courses, and job preparatory programs offered at the secondary and postsecondary level as part of the industrial education component in Florida. Curriculum frameworks are provided for 144 programs/clusters; representative topics are as…

  11. MR-Tandem: parallel X!Tandem using Hadoop MapReduce on Amazon Web Services.

    PubMed

    Pratt, Brian; Howbert, J Jeffry; Tasman, Natalie I; Nilsson, Erik J

    2012-01-01

    MR-Tandem adapts the popular X!Tandem peptide search engine to work with Hadoop MapReduce for reliable parallel execution of large searches. MR-Tandem runs on any Hadoop cluster but offers special support for Amazon Web Services for creating inexpensive on-demand Hadoop clusters, enabling search volumes that might not otherwise be feasible with the compute resources a researcher has at hand. MR-Tandem is designed to drop in wherever X!Tandem is already in use and requires no modification to existing X!Tandem parameter files, and only minimal modification to X!Tandem-based workflows. MR-Tandem is implemented as a lightly modified X!Tandem C++ executable and a Python script that drives Hadoop clusters including Amazon Web Services (AWS) Elastic Map Reduce (EMR), using the modified X!Tandem program as a Hadoop Streaming mapper and reducer. The modified X!Tandem C++ source code is Artistic licensed, supports pluggable scoring, and is available as part of the Sashimi project at http://sashimi.svn.sourceforge.net/viewvc/sashimi/trunk/trans_proteomic_pipeline/extern/xtandem/. The MR-Tandem Python script is Apache licensed and available as part of the Insilicos Cloud Army project at http://ica.svn.sourceforge.net/viewvc/ica/trunk/mr-tandem/. Full documentation and a windows installer that configures MR-Tandem, Python and all necessary packages are available at this same URL. brian.pratt@insilicos.com

  12. Google Classroom and Open Clusters: An Authentic Science Research Project for High School Students

    NASA Astrophysics Data System (ADS)

    Johnson, Chelen H.; Linahan, Marcella; Cuba, Allison Frances; Dickmann, Samantha Rose; Hogan, Eleanor B.; Karos, Demetra N.; Kozikowski, Kendall G.; Kozikowski, Lauren Paige; Nelson, Samantha Brooks; O'Hara, Kevin Thomas; Ropinski, Brandi Lucia; Scarpa, Gabriella; Garmany, Catharine D.

    2016-01-01

    STEM education is about offering unique opportunities to our students. For the past three years, students from two high schools (Breck School in Minneapolis, MN, and Carmel Catholic High School in Mundelein, IL) have collaborated on authentic astronomy research projects. This past year they surveyed archival data of open clusters to determine if a clear turnoff point could be unequivocally determined. Age and distance to each open cluster were calculated. Additionally, students requested time on several telescopes to obtain original data to compare to the archival data. Students from each school worked in collaborative teams, sharing and verifying results through regular online hangouts and chats. Work papers were stored in a shared drive and on a student-designed Google site to facilitate dissemination of documents between the two schools.

  13. Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

    PubMed Central

    Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

    2015-01-01

    This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309

  14. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    PubMed

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.

  15. Network-based spatial clustering technique for exploring features in regional industry

    NASA Astrophysics Data System (ADS)

    Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu

    2008-10-01

    In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.

  16. System of HPC content archiving

    NASA Astrophysics Data System (ADS)

    Bogdanov, A.; Ivashchenko, A.

    2017-12-01

    This work is aimed to develop a system, that will effectively solve the problem of storing and analyzing files containing text data, by using modern software development tools, techniques and approaches. The main challenge of storing a large number of text documents defined at the problem formulation stage, have to be resolved with such functionality as full text search and document clustering depends on their contents. Main system features could be described with notions of distributed multilevel architecture, flexibility and interchangeability of components, achieved through the standard functionality incapsulation in independent executable modules.

  17. Efficient Agent-Based Cluster Ensembles

    NASA Technical Reports Server (NTRS)

    Agogino, Adrian; Tumer, Kagan

    2006-01-01

    Numerous domains ranging from distributed data acquisition to knowledge reuse need to solve the cluster ensemble problem of combining multiple clusterings into a single unified clustering. Unfortunately current non-agent-based cluster combining methods do not work in a distributed environment, are not robust to corrupted clusterings and require centralized access to all original clusterings. Overcoming these issues will allow cluster ensembles to be used in fundamentally distributed and failure-prone domains such as data acquisition from satellite constellations, in addition to domains demanding confidentiality such as combining clusterings of user profiles. This paper proposes an efficient, distributed, agent-based clustering ensemble method that addresses these issues. In this approach each agent is assigned a small subset of the data and votes on which final cluster its data points should belong to. The final clustering is then evaluated by a global utility, computed in a distributed way. This clustering is also evaluated using an agent-specific utility that is shown to be easier for the agents to maximize. Results show that agents using the agent-specific utility can achieve better performance than traditional non-agent based methods and are effective even when up to 50% of the agents fail.

  18. Cannabis cultivation in Quebec: between space-time hotspots and coldspots.

    PubMed

    Chadillon-Farinacci, Véronique; Apparicio, Philippe; Morselli, Carlo

    2015-03-01

    Cannabis cultivation has become increasingly localized, whether soil-based or hydroponic growing methods are used. Characteristics of a given location, such as its climate and the equipment it requires may influence general accessibility or attract different types of offenders based on potential profits. The location of crops, especially hydroponic crops, suggests a certain proximity to the consumer market via semi-urban and urban environments, while making it possible to avoid detection. This article examines the cannabis market through its cultivation. The stability of temporal and spatial clusters of cannabis cultivation, hotspots, and coldspots between 2001 and 2009 in the province of Quebec, Canada, are addressed. Studying the geography of crime is not a new endeavor, but coldspots are rarely documented in drug market research. Using arrests and general population data, as well as Kulldorff's scan statistics, results show that the temporal distribution of cannabis cultivation is highly seasonal for soil-based methods. Hydroponic production shows adaptation to its soil-based counterpart. Stable patterns are found for both spatial distributions. Hotspots for soil-based cultivation are found near several urban centers and the Ontario border. For hydroponic cannabis cultivation, a new hotspot suggests the emergence of an American demand for Quebec-grown cannabis between 2007 and 2009. Curiously, the region surrounding Montreal, the largest urban center in Quebec, is a recurrent and stable coldspot for both methods of cultivation. For all periods, spatial clusters are stronger for soil-based methods than in the hydroponic context. Temporal differences and spatial similarities between soil-based cultivation and hydroponic cultivation are discussed. The role of the metropolis is also addressed. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Pharmacy Technologist.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of pharmacy technologist, lists technical competencies and competency builders for 16 units pertinent to the health technologies cluster in general as well as those specific to the occupation of pharmacy technologist. The following skill areas…

  20. Human Services. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists core standards and occupational knowledge and skills that have been identified and validated by industry as necessary to all Georgia students in secondary-level human services occupations programs. First, foundation skills are grouped as follows: basic skills (reading, writing, arithmetic/mathematics, listening, speaking);…

  1. Technical/Engineering. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists core standards and occupational knowledge and skills that have been identified and validated by industry as necessary to all Georgia students in secondary-level technical/engineering programs. First, foundation skills are grouped as follows: basic skills (reading, writing, arithmetic/mathematics, listening, speaking); thinking…

  2. Health Care. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists core standards and occupational knowledge and skills that have been identified/validated by industry as necessary to all Georgia students in secondary-level health care occupations programs. First, foundation skills are grouped as follows: basic skills (reading, writing, arithmetic/mathematics, listening, speaking); thinking…

  3. Respiratory Care Therapist.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of respiratory care therapist, lists technical competencies and competency builders for 18 units pertinent to the health technologies cluster in general as well as those specific to the occupation of respiratory care therapist. The following…

  4. Radiographer.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of radiographer, lists technical competencies and competency builders for 18 units pertinent to the health technologies cluster in general as well as those specific to the occupation of radiographer. The following skill areas are covered in the…

  5. Histotechnologist.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of histotechnologist, lists technical competencies and competency builders for 13 units pertinent to the health technologies cluster in general as well as those specific to the areas of histology and phlebotomy. The following skill areas are…

  6. Emergency Medical Technician.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of emergency medical technician, lists technical competencies and competency builders for 18 units pertinent to the health technologies cluster in general and 4 units specific to the occupation of emergency medical technician. The following…

  7. Dental Laboratory Technician.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of dental laboratory technician, lists technical competencies and competency builders for 13 units pertinent to the health technologies cluster in general and 8 units to the occupation of dental laboratory technician. The following skill areas…

  8. Identifying Differences among Novice Database Users: Implications for Training Material Effectiveness.

    ERIC Educational Resources Information Center

    Antonucci, Yvonne Lederer; Wozny, Lucy Anne

    1996-01-01

    Identifies and describes sublevels of novices using a database management package, clustering those whose interaction is effective, partially effective, and totally ineffective. Among assistance documentation, functional tree diagrams (FTDs) were more beneficial to partially effective users than traditional reference material. The results have…

  9. Manufacturing, Marketing and Distribution, Business and Office Occupations: Grade 8. Cluster III.

    ERIC Educational Resources Information Center

    Calhoun, Olivia H.

    A curriculum guide for grade 8, the document is divided into eleven units: marketing and distribution; food manufacturing; data processing and automation; administration, management, and labor; secretarial and clerical services; office machines; equipment; metal manufacturing and processing; prefabrication and prepackaging; textile and clothing…

  10. Physical Therapist Assistant.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of physical therapist assistant, lists technical competencies and competency builders for 16 units pertinent to the health technologies cluster in general as well as those specific to the occupation of physical therapist assistant. The…

  11. Occupational Therapy Assistant.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of occupational therapy assistant, lists technical competencies and competency builders for 16 units pertinent to the health technologies cluster in general as well as those specific to the occupation of occupational therapy assistant. The…

  12. Medical Laboratory Technician.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of medical laboratory technician, lists technical competencies and competency builders for 18 units pertinent to the health technologies cluster in general and 8 units specific to the occupation of medical laboratory technician. The following…

  13. Insight into acid-base nucleation experiments by comparison of the chemical composition of positive, negative, and neutral clusters.

    PubMed

    Bianchi, Federico; Praplan, Arnaud P; Sarnela, Nina; Dommen, Josef; Kürten, Andreas; Ortega, Ismael K; Schobesberger, Siegfried; Junninen, Heikki; Simon, Mario; Tröstl, Jasmin; Jokinen, Tuija; Sipilä, Mikko; Adamov, Alexey; Amorim, Antonio; Almeida, Joao; Breitenlechner, Martin; Duplissy, Jonathan; Ehrhart, Sebastian; Flagan, Richard C; Franchin, Alessandro; Hakala, Jani; Hansel, Armin; Heinritzi, Martin; Kangasluoma, Juha; Keskinen, Helmi; Kim, Jaeseok; Kirkby, Jasper; Laaksonen, Ari; Lawler, Michael J; Lehtipalo, Katrianne; Leiminger, Markus; Makhmutov, Vladimir; Mathot, Serge; Onnela, Antti; Petäjä, Tuukka; Riccobono, Francesco; Rissanen, Matti P; Rondo, Linda; Tomé, António; Virtanen, Annele; Viisanen, Yrjö; Williamson, Christina; Wimmer, Daniela; Winkler, Paul M; Ye, Penglin; Curtius, Joachim; Kulmala, Markku; Worsnop, Douglas R; Donahue, Neil M; Baltensperger, Urs

    2014-12-02

    We investigated the nucleation of sulfuric acid together with two bases (ammonia and dimethylamine), at the CLOUD chamber at CERN. The chemical composition of positive, negative, and neutral clusters was studied using three Atmospheric Pressure interface-Time Of Flight (APi-TOF) mass spectrometers: two were operated in positive and negative mode to detect the chamber ions, while the third was equipped with a nitrate ion chemical ionization source allowing detection of neutral clusters. Taking into account the possible fragmentation that can happen during the charging of the ions or within the first stage of the mass spectrometer, the cluster formation proceeded via essentially one-to-one acid-base addition for all of the clusters, independent of the type of the base. For the positive clusters, the charge is carried by one excess protonated base, while for the negative clusters it is carried by a deprotonated acid; the same is true for the neutral clusters after these have been ionized. During the experiments involving sulfuric acid and dimethylamine, it was possible to study the appearance time for all the clusters (positive, negative, and neutral). It appeared that, after the formation of the clusters containing three molecules of sulfuric acid, the clusters grow at a similar speed, independent of their charge. The growth rate is then probably limited by the arrival rate of sulfuric acid or cluster-cluster collision.

  14. Understanding Lustre Internals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Feiyi; Oral, H Sarp; Shipman, Galen M

    2009-04-01

    Lustre was initiated and funded, almost a decade ago, by the U.S. Department of Energy (DoE) Office of Science and National Nuclear Security Administration laboratories to address the need for an open source, highly-scalable, high-performance parallel filesystem on by then present and future supercomputing platforms. Throughout the last decade, it was deployed over numerous medium-to-large-scale supercomputing platforms and clusters, and it performed and met the expectations of the Lustre user community. As it stands at the time of writing this document, according to the Top500 list, 15 of the top 30 supercomputers in the world use Lustre filesystem. This reportmore » aims to present a streamlined overview on how Lustre works internally at reasonable details including relevant data structures, APIs, protocols and algorithms involved for Lustre version 1.6 source code base. More importantly, it tries to explain how various components interconnect with each other and function as a system. Portions of this report are based on discussions with Oak Ridge National Laboratory Lustre Center of Excellence team members and portions of it are based on our own understanding of how the code works. We, as the authors team bare all responsibilities for all errors and omissions in this document. We can only hope it helps current and future Lustre users and Lustre code developers as much as it helped us understanding the Lustre source code and its internal workings.« less

  15. Membership determination of open clusters based on a spectral clustering method

    NASA Astrophysics Data System (ADS)

    Gao, Xin-Hua

    2018-06-01

    We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.

  16. Structure based alignment and clustering of proteins (STRALCP)

    DOEpatents

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  17. Vascular Morphogenesis in the Context of Inflammation: Self-Organization in a Fibrin-Based 3D Culture System.

    PubMed

    Rüger, Beate M; Buchacher, Tanja; Giurea, Alexander; Kubista, Bernd; Fischer, Michael B; Breuss, Johannes M

    2018-01-01

    Introduction: New vessel formation requires a continuous and tightly regulated interplay between endothelial cells with cells of the perivascular microenvironment supported by mechanic-physical and chemical cues from the extracellular matrix. Aim: Here we investigated the potential of small fragments of synovial tissue to form de novo vascular structures in the context of inflammation within three dimensional (3D) fibrin-based matrices in vitro , and assessed the contribution of mesenchymal stromal cell (MSC)-immune cell cross-talk to neovascularization considering paracrine signals in a fibrin-based co-culture model. Material and Methods: Synovial tissue fragments from patients with rheumatoid arthritis (RA) and inflammatory osteoarthritis (OA) were cultivated within 3D fibrin matrices for up to 4 weeks. Cellular and structural re-arrangement of the initially acellular matrix were documented by phase contrast microscopy and characterized by confocal laser-scanning microscopy of topographically intact 3D cultures and by immunohistochemistry. MSC-peripheral blood mononuclear cell (PBMC) co-cultures in the 3D fibrin system specifically addressed the influence of perivascular cell interactions to neo-vessel formation in a pro-inflammatory microenvironment. Cytokine levels in the supernatants of cultured explant tissues and co-cultures were evaluated by the Bio-Plex cytokine assay and ELISA. Results: Vascular outgrowth from the embedded tissue into the fibrin matrix was preceded by leukocyte egress from the tissue fragments. Neo-vessels originating from both the embedded sample and from clusters locally formed by emigrated mononuclear cells were consistently associated with CD45 + leukocytes. MSC and PBMC in co-culture formed vasculogenic clusters. Clusters and cells with endothelial phenotype emerging from them, were surrounded by a collagen IV scaffold. No vascular structures were observed in control 3D monocultures of PBMC or MSC. Paracrine signals released by cultured OA tissue fragments corresponded with elevated levels of granulocyte-colony stimulating factor, vascular endothelial growth factor and interleukin-6 secreted by MSC-PBMC co-cultures. Conclusion: Our results show that synovial tissue fragments with immune cell infiltrates have the potential to form new vessels in initially avascular 3D fibrin-based matrices. Cross-talk and cluster formation of MSC with immune cells within the 3D fibrin environment through self-organization and secretion of pro-angiogenic paracrine factors can support neo-vessel growth.

  18. How Do Social Capital and HIV/AIDS Outcomes Geographically Cluster and Which Sociocontextual Mechanisms Predict Differences Across Clusters?

    PubMed

    Ransome, Yusuf; Dean, Lorraine T; Crawford, Natalie D; Metzger, David S; Blank, Michael B; Nunn, Amy S

    2017-09-01

    Place of residence has been associated with HIV transmission risks. Social capital, defined as features of social organization that improve efficiency of society by facilitating coordinated actions, often varies by neighborhood, and hypothesized to have protective effects on HIV care continuum outcomes. We examined whether the association between social capital and 2 HIV care continuum outcomes clustered geographically and whether sociocontextual mechanisms predict differences across clusters. Bivariate Local Moran's I evaluated geographical clustering in the association between social capital (participation in civic and social organizations, 2006, 2008, 2010) and [5-year (2007-2011) prevalence of late HIV diagnosis and linkage to HIV care] across Philadelphia, PA, census tracts (N = 378). Maps documented the clusters and multinomial regression assessed which sociocontextual mechanisms (eg, racial composition) predict differences across clusters. We identified 4 significant clusters (high social capital-high HIV/AIDS, low social capital-low HIV/AIDS, low social capital-high HIV/AIDS, and high social capital-low HIV/AIDS). Moran's I between social capital and late HIV diagnosis was (I = 0.19, z = 9.54, P < 0.001) and linkage to HIV care (I = 0.06, z = 3.274, P = 0.002). In multivariable analysis, median household income predicted differences across clusters, particularly where social capital was lowest and HIV burden the highest, compared with clusters with high social capital and lowest HIV burden. The association between social participation and HIV care continuum outcomes cluster geographically in Philadelphia, PA. HIV prevention interventions should account for this phenomenon. Reducing geographic disparities will require interventions tailored to each continuum step and that address socioeconomic factors such as neighborhood median income.

  19. Single-cell heterogeneity in ductal carcinoma in situ of breast.

    PubMed

    Gerdes, Michael J; Gökmen-Polar, Yesim; Sui, Yunxia; Pang, Alberto Santamaria; LaPlante, Nicole; Harris, Adrian L; Tan, Puay-Hoon; Ginty, Fiona; Badve, Sunil S

    2018-03-01

    Heterogeneous patterns of mutations and RNA expression have been well documented in invasive cancers. However, technological challenges have limited the ability to study heterogeneity of protein expression. This is particularly true for pre-invasive lesions such as ductal carcinoma in situ of the breast. Cell-level heterogeneity in ductal carcinoma in situ was analyzed in a single 5 μm tissue section using a multiplexed immunofluorescence analysis of 11 disease-related markers (EGFR, HER2, HER4, S6, pmTOR, CD44v6, SLC7A5 and CD10, CD4, CD8 and CD20, plus pan-cytokeratin, pan-cadherin, DAPI, and Na+K+ATPase for cell segmentation). Expression was quantified at cell level using a single-cell segmentation algorithm. K-means clustering was used to determine co-expression patterns of epithelial cell markers and immune markers. We document for the first time the presence of epithelial cell heterogeneity within ducts, between ducts and between patients with ductal carcinoma in situ. There was moderate heterogeneity in a distribution of eight clusters within each duct (average Shannon index 0.76; range 0-1.61). Furthermore, within each patient, the average Shannon index across all ducts ranged from 0.33 to 1.02 (s.d. 0.09-0.38). As the distribution of clusters within ducts was uneven, the analysis of eight ducts might be sufficient to represent all the clusters ie within- and between-duct heterogeneity. The pattern of epithelial cell clustering was associated with the presence and type of immune infiltrates, indicating a complex interaction between the epithelial tumor and immune system for each patient. This analysis also provides the first evidence that simultaneous analysis of both the epithelial and immune/stromal components might be necessary to understand the complex milieu in ductal carcinoma in situ lesions.

  20. Cluster randomized trial of a multilevel evidence-based quality improvement approach to tailoring VA Patient Aligned Care Teams to the needs of women Veterans.

    PubMed

    Yano, Elizabeth M; Darling, Jill E; Hamilton, Alison B; Canelo, Ismelda; Chuang, Emmeline; Meredith, Lisa S; Rubenstein, Lisa V

    2016-07-19

    The Veterans Health Administration (VA) has undertaken a major initiative to transform care through implementation of Patient Aligned Care Teams (PACTs). Based on the patient-centered medical home (PCMH) concept, PACT aims to improve access, continuity, coordination, and comprehensiveness using team-based care that is patient-driven and patient-centered. However, how VA should adapt PACT to meet the needs of special populations, such as women Veterans (WVs), was not considered in initial implementation guidance. WVs' numerical minority in VA healthcare settings (approximately 7-8 % of users) creates logistical challenges to delivering gender-sensitive comprehensive care. The main goal of this study is to test an evidence-based quality improvement approach (EBQI) to tailoring PACT to meet the needs of WVs, incorporating comprehensive primary care services and gender-specific care in gender-sensitive environments, thereby accelerating achievement of PACT tenets for women (Women's Health (WH)-PACT). EBQI is a systematic approach to developing a multilevel research-clinical partnership that engages senior organizational leaders and local quality improvement (QI) teams in adapting and implementing new care models in the context of prior evidence and local practice conditions, with researchers providing technical support, formative feedback, and practice facilitation. In a 12-site cluster randomized trial, we will evaluate WH-PACT model achievement using patient, provider, staff, and practice surveys, in addition to analyses of secondary administrative and chart-based data. We will explore impacts of receipt of WH-PACT care on quality of chronic disease care and prevention, health status, patient satisfaction and experience of care, provider experience, utilization, and costs. Using mixed methods, we will assess pre-post practice contexts; document EBQI activities undertaken in participating facilities and their relationship to provider/staff and team actions/attitudes; document WH-PACT implementation; and examine barriers/facilitators to EBQI-supported WH-PACT implementation through a combination of semi-structured interviews and monthly formative progress narratives and administrative data. Lack of gender-sensitive comprehensive care has demonstrated consequences for the technical quality and ratings of care among WVs and may contribute to decisions to continue use or seek care elsewhere under the US Affordable Care Act. We hypothesize that tailoring PACT implementation through EBQI may improve the experience and quality of care at many levels. ClinicalTrials.gov, NCT02039856.

  1. Cluster Size Optimization in Sensor Networks with Decentralized Cluster-Based Protocols

    PubMed Central

    Amini, Navid; Vahdatpour, Alireza; Xu, Wenyao; Gerla, Mario; Sarrafzadeh, Majid

    2011-01-01

    Network lifetime and energy-efficiency are viewed as the dominating considerations in designing cluster-based communication protocols for wireless sensor networks. This paper analytically provides the optimal cluster size that minimizes the total energy expenditure in such networks, where all sensors communicate data through their elected cluster heads to the base station in a decentralized fashion. LEACH, LEACH-Coverage, and DBS comprise three cluster-based protocols investigated in this paper that do not require any centralized support from a certain node. The analytical outcomes are given in the form of closed-form expressions for various widely-used network configurations. Extensive simulations on different networks are used to confirm the expectations based on the analytical results. To obtain a thorough understanding of the results, cluster number variability problem is identified and inspected from the energy consumption point of view. PMID:22267882

  2. A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks.

    PubMed

    Gui, Jinsong; Zhou, Kai; Xiong, Naixue

    2016-09-25

    Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude.

  3. A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks

    PubMed Central

    Gui, Jinsong; Zhou, Kai; Xiong, Naixue

    2016-01-01

    Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude. PMID:27681731

  4. Internal Cluster Validation on Earthquake Data in the Province of Bengkulu

    NASA Astrophysics Data System (ADS)

    Rini, D. S.; Novianti, P.; Fransiska, H.

    2018-04-01

    K-means method is an algorithm for cluster n object based on attribute to k partition, where k < n. There is a deficiency of algorithms that is before the algorithm is executed, k points are initialized randomly so that the resulting data clustering can be different. If the random value for initialization is not good, the clustering becomes less optimum. Cluster validation is a technique to determine the optimum cluster without knowing prior information from data. There are two types of cluster validation, which are internal cluster validation and external cluster validation. This study aims to examine and apply some internal cluster validation, including the Calinski-Harabasz (CH) Index, Sillhouette (S) Index, Davies-Bouldin (DB) Index, Dunn Index (D), and S-Dbw Index on earthquake data in the Bengkulu Province. The calculation result of optimum cluster based on internal cluster validation is CH index, S index, and S-Dbw index yield k = 2, DB Index with k = 6 and Index D with k = 15. Optimum cluster (k = 6) based on DB Index gives good results for clustering earthquake in the Bengkulu Province.

  5. Prevalence and risk factors of seizure clusters in adult patients with epilepsy.

    PubMed

    Chen, Baibing; Choi, Hyunmi; Hirsch, Lawrence J; Katz, Austen; Legge, Alexander; Wong, Rebecca A; Jiang, Alfred; Kato, Kenneth; Buchsbaum, Richard; Detyniecki, Kamil

    2017-07-01

    In the current study, we explored the prevalence of physician-confirmed seizure clusters. We also investigated potential clinical factors associated with the occurrence of seizure clusters overall and by epilepsy type. We reviewed medical records of 4116 adult (≥16years old) outpatients with epilepsy at our centers for documentation of seizure clusters. Variables including patient demographics, epilepsy details, medical and psychiatric history, AED history, and epilepsy risk factors were then tested against history of seizure clusters. Patients were then divided into focal epilepsy, idiopathic generalized epilepsy (IGE), or symptomatic generalized epilepsy (SGE), and the same analysis was run. Overall, seizure clusters were independently associated with earlier age of seizure onset, symptomatic generalized epilepsy (SGE), central nervous system (CNS) infection, cortical dysplasia, status epilepticus, absence of 1-year seizure freedom, and having failed 2 or more AEDs (P<0.0026). Patients with SGE (27.1%) were more likely to develop seizure clusters than patients with focal epilepsy (16.3%) and IGE (7.4%; all P<0.001). Analysis by epilepsy type showed that absence of 1-year seizure freedom since starting treatment at one of our centers was associated with seizure clustering in patients across all 3 epilepsy types. In patients with SGE, clusters were associated with perinatal/congenital brain injury. In patients with focal epilepsy, clusters were associated with younger age of seizure onset, complex partial seizures, cortical dysplasia, status epilepticus, CNS infection, and having failed 2 or more AEDs. In patients with IGE, clusters were associated with presence of an aura. Only 43.5% of patients with seizure clusters were prescribed rescue medications. Patients with intractable epilepsy are at a higher risk of developing seizure clusters. Factors such as having SGE, CNS infection, cortical dysplasia, status epilepticus or an early seizure onset, can also independently increase one's chance of having seizure clusters. Copyright © 2017. Published by Elsevier B.V.

  6. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    PubMed Central

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided. PMID:22969350

  7. A survey on the taxonomy of cluster-based routing protocols for homogeneous wireless sensor networks.

    PubMed

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  8. Feature extraction for document text using Latent Dirichlet Allocation

    NASA Astrophysics Data System (ADS)

    Prihatini, P. M.; Suryawan, I. K.; Mandia, IN

    2018-01-01

    Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.

  9. Inference from clustering with application to gene-expression microarrays.

    PubMed

    Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

    2002-01-01

    There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

  10. Exploring Careers in Marketing and Distribution: A Guide for Teachers.

    ERIC Educational Resources Information Center

    Insko, Merle A.

    One of 11 guides intended for use at the junior high school level of career exploration, the document identifies job families within the marketing and distribution occupational cluster, identifies occupations within each family, and gives suggestions for possible classroom experiences, references, and evaluations, as well as supportive materials.…

  11. Teaching Gifted Children in Today's Preschool and Primary Classrooms: Identifying, Nurturing, and Challenging Children Ages 4-9

    ERIC Educational Resources Information Center

    Smutny, Joan Franklin; Walker, Sally Yahnke; Honeck, Ellen I.

    2016-01-01

    These proven, practical early childhood teaching strategies help teachers identify young gifted children, differentiate curriculum, assess and document students' development, and build partnerships with parents. Chapters focus on early identification, curriculum compacting, social studies, language arts, math and science, cluster grouping,…

  12. Illinois Occupational Skill Standards: Occupational Therapy Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in jobs in occupational therapy. Agency partners involved in this project include: the Illinois State board of Education, Illinois Community College…

  13. Illinois Occupational Skill Standards: Agricultural Sales and Marketing Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in jobs in agricultural sales and marketing. Agency partners involved in this project include: the Illinois State Board of Education, Illinois Community…

  14. Environmental and Agricultural Sciences. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists core standards and occupational knowledge amd skills that have been identified/validated by industry as necessary to all Georgia students in secondary-level environmental and agricultural sciences programs. First, foundation skills are grouped as follows: basic skills (reading, writing, arithmetic/mathematics, listening,…

  15. Business, Marketing, and Information Management. Georgia Core Standards for Occupational Clusters.

    ERIC Educational Resources Information Center

    Georgia Univ., Athens. Dept. of Occupational Studies.

    This document lists core standards and occupational knowledge and skills that have been identified and validated by industry as necessary to all Georgia students in business, marketing, and information management programs. First, foundation skills are grouped as follows: basic skills (reading, writing, arithmetic/mathematics, listening, speaking);…

  16. Mississippi Curriculum Framework for Technology Discovery (9th Grade). CIP: 00.0253.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for technology educators in Mississippi, outlines a technology discovery course in which a modular instruction approach allows ninth graders to experience various workplace technologies within four career cluster areas: agriculture/natural resources technology, business/marketing technology, health/human services…

  17. Task Lists for Agricultural Occupations, 1988: Cluster Matrices for Agricultural Occupations. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Pepple, Jerry

    This document contains four publications for agricultural occupations in Illinois. "Task Lists for Agricultural Occupations" provide lists of employability skills for the following: park aide; hand sprayer; gardener/groundskeeper; salesperson, parts, agricultural equipment; and dairy processing equipment operator. Each list contains skills…

  18. Illinois Occupational Skill Standards: HVAC/R Technician Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in jobs in the heating, ventilation, air conditioning, and refrigeration (HVAC/R) industry. Agency partners involved in this project include: the…

  19. Second Annual Career Guidance Institute: Final Report.

    ERIC Educational Resources Information Center

    Schenck, Norma Elaine

    The document reports on the organization and implementation plans for Indiana's Second Annual Career Guidance Institute and the sound/slide programs developed on six career cluster areas. An extensive evaluation analyzes the Institute in light of its objectives, offers insights gained on career opportunities, gives changes in attitude regarding…

  20. Registered Nurse (Associate Degree).

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of registered nurse (with an associate degree), lists technical competencies and competency builders for 19 units pertinent to the health technologies cluster in general and 5 units specific to the occupation of registered nurse. The following…

  1. Illinois Occupational Skill Standards: Plastics Molding Cluster.

    ERIC Educational Resources Information Center

    Illinois Occupational Skill Standards and Credentialing Council, Carbondale.

    This document, which is intended to serve as a guide for work force preparation program providers, details the Illinois occupational skill standards for programs preparing students for employment in jobs in the plastics molding industry. Agency partners involved in this project include: the Illinois State Board of Education, Illinois Community…

  2. Dental Hygienist.

    ERIC Educational Resources Information Center

    Ohio State Univ., Columbus. Center on Education and Training for Employment.

    This document, which is designed for use in developing a tech prep competency profile for the occupation of dental hygienist, lists technical competencies and competency builders for 13 units pertinent to the health technologies cluster in general and 9 units specific to the occupation of dental hygienist. The following skill areas are covered in…

  3. Canonical PSO Based K-Means Clustering Approach for Real Datasets.

    PubMed

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    "Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

  4. A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

    PubMed Central

    Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.

    2017-01-01

    ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158

  5. "Pleiades Visions" for organ solo: A composition supported by documented research

    NASA Astrophysics Data System (ADS)

    Whitehouse, Matthew Robert

    Pleiades Visions is a three-movement work for organ solo inspired by indigenous music and mythology associated with the Pleiades (Seven Sisters) star cluster. Three cultural groups are represented in Pleiades Visions. The first movement, entitled "Uluru," draws from Australian Aboriginal music and mythology. The second movement, entitled "...life on other worlds," is based loosely on a Quechan (Yuman) Indian song. The concluding movement, entitled "Mauna Kea," is inspired by the opening lines of the Kumulipo, a creation chant of the Native Hawaiian culture. The source material for Pleiades Visions was identified through research incorporating techniques from the fields of cultural astronomy and ethnomusicology. This research represents a new line of inquiry for both fields. This document situates Pleiades Visions in the context of the organ literature, and suggests that Pleiades Visions might be the first organ work with a cultural astronomy inspiration. It also describes the research undergirding Pleiades Visions, demonstrates the manner in which that research informed the composition of the work, and addresses issues surrounding the use of indigenous source material in a culturally sensitive manner.

  6. A collective case study of nursing students with learning disabilities.

    PubMed

    Kolanko, Kathrine M

    2003-01-01

    This collective case study described the meaning of being a nursing student with a learning disability and examined how baccalaureate nursing students with learning disabilities experienced various aspects of the nursing program. It also examined how their disabilities and previous educational and personal experiences influenced the meaning that they gave to their educational experiences. Seven nursing students were interviewed, completed a demographic data form, and submitted various artifacts (test scores, evaluation reports, and curriculum-based material) for document analysis. The researcher used Stake's model for collective case study research and analysis (1). Data analysis revealed five themes: 1) struggle, 2) learning how to learn with LD, 3) issues concerning time, 4) social support, and 5) personal stories. Theme clusters and individual variations were identified for each theme. Document analysis revealed that participants had average to above average intellectual functioning with an ability-achievement discrepancy among standardized test scores. Participants noted that direct instruction, structure, consistency, clear directions, organization, and a positive instructor attitude assisted learning. Anxiety, social isolation from peers, and limited time to process and complete work were problems faced by the participants.

  7. The legacy of war: an epidemiological study of cluster weapon and land mine accidents in Quang Tri Province, Vietnam.

    PubMed

    Phung, Tran Kim; Le, Viet; Husum, Hans

    2012-07-01

    The study examines the epidemiology of cluster weapon and land mine accidents in Quang Tri Province since the end of the Vietnam War. The province is located just south of the demarcation line and was the province most affected during the war. In 2009, a cross sectional household study was conducted in all nine districts of the province. During the study period of 1975-2009, 7,030 persons in the study area were exposed to unexploded ordnances (UXO) or land mine accidents, or 1.1% of the provincial population. There were 2,620 fatalities and 4,410 accident survivors. The study documents that the main problem is cluster weapons and other unexploded ordnances; only 4.3% of casualties were caused by land mines. The legacy of the war affects poor people the most; the accident rate was highest among villagers living in mountainous areas, ethnic minorities, and low-income families. The most common activities leading to the accidents were farming (38.6%), collecting scrap metal (11.2%), and herding of cattle (8.3%). The study documents that the people of the Quang Tri Province until this day have suffered heavily due to the legacy of war. Mine risk education programs should account for the epidemiological findings when future accident prevention programs are designed to target high-risk areas and activities.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bogen, Paul Logasa; McKenzie, Amber T; Gillen, Rob

    Forensic document analysis has become an important aspect of investigation of many different kinds of crimes from money laundering to fraud and from cybercrime to smuggling. The current workflow for analysts includes powerful tools, such as Palantir and Analyst s Notebook, for moving from evidence to actionable intelligence and tools for finding documents among the millions of files on a hard disk, such as FTK. However, the analysts often leave the process of sorting through collections of seized documents to filter out the noise from the actual evidence to a highly labor-intensive manual effort. This paper presents the Redeye Analysismore » Workbench, a tool to help analysts move from manual sorting of a collection of documents to performing intelligent document triage over a digital library. We will discuss the tools and techniques we build upon in addition to an in-depth discussion of our tool and how it addresses two major use cases we observed analysts performing. Finally, we also include a new layout algorithm for radial graphs that is used to visualize clusters of documents in our system.« less

  9. Sexual Orientation, Gender, and Environmental Injustice: Unequal Carcinogenic Air Pollution Risks in Greater Houston

    PubMed Central

    Collins, Timothy W.; Grineski, Sara E.; Morales, Danielle X.

    2017-01-01

    Disparate residential hazard exposures based on disadvantaged gender status (e.g., among female-headed households) have been documented in the distributive environmental justice literature, yet no published studies have examined whether disproportionate environmental risks exist based on minority sexual orientation. To address this gap, we use data from the US Census, American Community Survey and the Environmental Protection Agency at the 2010 census tract level to examine the spatial relationships between same-sex partner households and cumulative cancer risk from exposure to hazardous air pollutants (HAPs) emitted by all ambient emission sources in Greater Houston (Texas). Findings from generalized estimating equation analyses demonstrate that increased cancer risks from HAPs are significantly associated with neighborhoods having relatively high concentrations of resident same-sex partner households, adjusting for geographic clustering and variables known to influence risk (i.e., race, ethnicity, socioeconomic status, renter status, income inequality, and population density). However, HAP exposures are distributed differently for same-sex male versus same-sex female partner households. Neighborhoods with relatively high proportions of same-sex male partner households are associated with significantly greater exposure to cancer-causing HAPs while those with high proportions of same-sex female partner households are associated with less exposure. This study provides initial empirical documentation of a previously unstudied pattern, and infuses current theoretical understanding of environmental inequality formation with knowledge emanating from the sexualities and space literature. Practically, results suggest that other documented health risks experienced in gay neighborhoods may be compounded by disparate health risks associated with harmful exposures to air toxics. PMID:29098204

  10. Analysis of co-occurrence toponyms in web pages based on complex networks

    NASA Astrophysics Data System (ADS)

    Zhong, Xiang; Liu, Jiajun; Gao, Yong; Wu, Lun

    2017-01-01

    A large number of geographical toponyms exist in web pages and other documents, providing abundant geographical resources for GIS. It is very common for toponyms to co-occur in the same documents. To investigate these relations associated with geographic entities, a novel complex network model for co-occurrence toponyms is proposed. Then, 12 toponym co-occurrence networks are constructed from the toponym sets extracted from the People's Daily Paper documents of 2010. It is found that two toponyms have a high co-occurrence probability if they are at the same administrative level or if they possess a part-whole relationship. By applying complex network analysis methods to toponym co-occurrence networks, we find the following characteristics. (1) The navigation vertices of the co-occurrence networks can be found by degree centrality analysis. (2) The networks express strong cluster characteristics, and it takes only several steps to reach one vertex from another one, implying that the networks are small-world graphs. (3) The degree distribution satisfies the power law with an exponent of 1.7, so the networks are free-scale. (4) The networks are disassortative and have similar assortative modes, with assortative exponents of approximately 0.18 and assortative indexes less than 0. (5) The frequency of toponym co-occurrence is weakly negatively correlated with geographic distance, but more strongly negatively correlated with administrative hierarchical distance. Considering the toponym frequencies and co-occurrence relationships, a novel method based on link analysis is presented to extract the core toponyms from web pages. This method is suitable and effective for geographical information retrieval.

  11. Sexual Orientation, Gender, and Environmental Injustice: Unequal Carcinogenic Air Pollution Risks in Greater Houston.

    PubMed

    Collins, Timothy W; Grineski, Sara E; Morales, Danielle X

    2017-01-01

    Disparate residential hazard exposures based on disadvantaged gender status (e.g., among female-headed households) have been documented in the distributive environmental justice literature, yet no published studies have examined whether disproportionate environmental risks exist based on minority sexual orientation. To address this gap, we use data from the US Census, American Community Survey and the Environmental Protection Agency at the 2010 census tract level to examine the spatial relationships between same-sex partner households and cumulative cancer risk from exposure to hazardous air pollutants (HAPs) emitted by all ambient emission sources in Greater Houston (Texas). Findings from generalized estimating equation analyses demonstrate that increased cancer risks from HAPs are significantly associated with neighborhoods having relatively high concentrations of resident same-sex partner households, adjusting for geographic clustering and variables known to influence risk (i.e., race, ethnicity, socioeconomic status, renter status, income inequality, and population density). However, HAP exposures are distributed differently for same-sex male versus same-sex female partner households. Neighborhoods with relatively high proportions of same-sex male partner households are associated with significantly greater exposure to cancer-causing HAPs while those with high proportions of same-sex female partner households are associated with less exposure. This study provides initial empirical documentation of a previously unstudied pattern, and infuses current theoretical understanding of environmental inequality formation with knowledge emanating from the sexualities and space literature. Practically, results suggest that other documented health risks experienced in gay neighborhoods may be compounded by disparate health risks associated with harmful exposures to air toxics.

  12. Novel layered clustering-based approach for generating ensemble of classifiers.

    PubMed

    Rahman, Ashfaqur; Verma, Brijesh

    2011-05-01

    This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.

  13. An information theory analysis of spatial decisions in cognitive development

    PubMed Central

    Scott, Nicole M.; Sera, Maria D.; Georgopoulos, Apostolos P.

    2015-01-01

    Performance in a cognitive task can be considered as the outcome of a decision-making process operating across various knowledge domains or aspects of a single domain. Therefore, an analysis of these decisions in various tasks can shed light on the interplay and integration of these domains (or elements within a single domain) as they are associated with specific task characteristics. In this study, we applied an information theoretic approach to assess quantitatively the gain of knowledge across various elements of the cognitive domain of spatial, relational knowledge, as a function of development. Specifically, we examined changing spatial relational knowledge from ages 5 to 10 years. Our analyses consisted of a two-step process. First, we performed a hierarchical clustering analysis on the decisions made in 16 different tasks of spatial relational knowledge to determine which tasks were performed similarly at each age group as well as to discover how the tasks clustered together. We next used two measures of entropy to capture the gradual emergence of order in the development of relational knowledge. These measures of “cognitive entropy” were defined based on two independent aspects of chunking, namely (1) the number of clusters formed at each age group, and (2) the distribution of tasks across the clusters. We found that both measures of entropy decreased with age in a quadratic fashion and were positively and linearly correlated. The decrease in entropy and, therefore, gain of information during development was accompanied by improved performance. These results document, for the first time, the orderly and progressively structured “chunking” of decisions across the development of spatial relational reasoning and quantify this gain within a formal information-theoretic framework. PMID:25698915

  14. Strain rates, stress markers and earthquake clustering (Invited)

    NASA Astrophysics Data System (ADS)

    Fry, B.; Gerstenberger, M.; Abercrombie, R. E.; Reyners, M.; Eberhart-Phillips, D. M.

    2013-12-01

    The 2010-present Canterbury earthquakes comprise a well-recorded sequence in a relatively low strain-rate shallow crustal region. We present new scientific results to test the hypothesis that: Earthquake sequences in low-strain rate areas experience high stress drop events, low-post seismic relaxation, and accentuated seismic clustering. This hypothesis is based on a physical description of the aftershock process in which the spatial distribution of stress accumulation and stress transfer are controlled by fault strength and orientation. Following large crustal earthquakes, time dependent forecasts are often developed by fitting parameters defined by Omori's aftershock decay law. In high-strain rate areas, simple forecast models utilizing a single p-value fit observed aftershock sequences well. In low-strain rate areas such as Canterbury, assumptions of simple Omori decay may not be sufficient to capture the clustering (sub-sequence) nature exhibited by the punctuated rise in activity following significant child events. In Canterbury, the moment release is more clustered than in more typical Omori sequences. The individual earthquakes in these clusters also exhibit somewhat higher stress drops than in the average crustal sequence in high-strain rate regions, suggesting the earthquakes occur on strong Andersonian-oriented faults, possibly juvenile or well-healed . We use the spectral ratio procedure outlined in (Viegas et al., 2010) to determine corner frequencies and Madariaga stress-drop values for over 800 events in the sequence. Furthermore, we will discuss the relevance of tomographic results of Reyners and Eberhart-Phillips (2013) documenting post-seismic stress-driven fluid processes following the three largest events in the sequence as well as anisotropic patterns in surface wave tomography (Fry et al., 2013). These tomographic studies are both compatible with the hypothesis, providing strong evidence for the presence of widespread and hydrated regional upper crustal cracking parallel to sub-parallel to the dominant transverse failure plane in the sequence. Joint interpretation of the three separate datasets provide a positive first attempt at testing our fundamental hypothesis.

  15. Gender and Health Behavior Clustering among U.S. Young Adults

    PubMed Central

    Olson, Julie Skalamera; Hummer, Robert A.; Harris, Kathleen Mullan

    2016-01-01

    U.S. trends in population health suggest alarming disparities among young adults who are less healthy across most measureable domains than their counterparts in other high-income countries; these international comparisons are particularly troubling for women. To deepen our understanding of gender disparities in health and underlying behavioral contributions, we document gender-specific clusters of health behavior among U.S. young adults using nationally representative data from the National Longitudinal Study of Adolescent to Adult Health. We find high levels of poor health behavior, but especially among men; 40 percent of men clustered into a group characterized by unhealthy behavior (e.g., poor diet, no exercise, substance use), compared to only 22 percent of women. Additionally, women tend to age out of unhealthy behaviors in young adulthood more than men. Further, we uncover gender differences in the extent to which sociodemographic position and adolescent contexts inform health behavior clustering. For example, college education was more protective for men, whereas marital status was equally protective across gender. Parental drinking mattered for health behavior clustering among men, whereas peer drinking mattered for clustering among women. We discuss these results in the context of declining female advantage in U.S. health and changing young adult social and health contexts. PMID:28287308

  16. Gender and Health Behavior Clustering among U.S. Young Adults.

    PubMed

    Olson, Julie Skalamera; Hummer, Robert A; Harris, Kathleen Mullan

    2017-01-01

    U.S. trends in population health suggest alarming disparities among young adults, who are less healthy across most measureable domains than their counterparts in other high-income countries; these international comparisons are particularly troubling for women. To deepen our understanding of gender disparities in health and underlying behavioral contributions, we document gender-specific clusters of health behavior among U.S. young adults using nationally representative data from the National Longitudinal Study of Adolescent to Adult Health. We find high levels of poor health behavior, but especially among men; 40 percent of men clustered into a group characterized by unhealthy behavior (e.g., poor diet, no exercise, substance use), compared to only 22 percent of women. Additionally, women tend to age out of unhealthy behaviors in young adulthood more than men. Further, we uncover gender differences in the extent to which sociodemographic position and adolescent contexts inform health behavior clustering. For example, college education was more protective for men, whereas marital status was equally protective across gender. Parental drinking mattered for health behavior clustering among men, whereas peer drinking mattered for clustering among women. We discuss these results in the context of declining female advantage in U.S. health and changing young adult social and health contexts.

  17. A cluster merging method for time series microarray with production values.

    PubMed

    Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio

    2014-09-01

    A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

  18. Level statistics of words: Finding keywords in literary texts and symbolic sequences

    NASA Astrophysics Data System (ADS)

    Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.

    2009-03-01

    Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

  19. Double Cluster Heads Model for Secure and Accurate Data Fusion in Wireless Sensor Networks

    PubMed Central

    Fu, Jun-Song; Liu, Yun

    2015-01-01

    Secure and accurate data fusion is an important issue in wireless sensor networks (WSNs) and has been extensively researched in the literature. In this paper, by combining clustering techniques, reputation and trust systems, and data fusion algorithms, we propose a novel cluster-based data fusion model called Double Cluster Heads Model (DCHM) for secure and accurate data fusion in WSNs. Different from traditional clustering models in WSNs, two cluster heads are selected after clustering for each cluster based on the reputation and trust system and they perform data fusion independently of each other. Then, the results are sent to the base station where the dissimilarity coefficient is computed. If the dissimilarity coefficient of the two data fusion results exceeds the threshold preset by the users, the cluster heads will be added to blacklist, and the cluster heads must be reelected by the sensor nodes in a cluster. Meanwhile, feedback is sent from the base station to the reputation and trust system, which can help us to identify and delete the compromised sensor nodes in time. Through a series of extensive simulations, we found that the DCHM performed very well in data fusion security and accuracy. PMID:25608211

  20. Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster

    NASA Technical Reports Server (NTRS)

    Lopez, Isaac

    2001-01-01

    Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.

  1. Nontobacco substance use, sexual abuse, HIV, and sexually transmitted infection among street children in Kolkata, India.

    PubMed

    Bal, Baishali; Mitra, Rupa; Mallick, Aiyel H; Chakraborti, Sekhar; Sarkar, Kamalesh

    2010-08-01

    A community-based cross-sectional study among 554 Kolkata city street children assessed nontobacco substance use and sexual abuses along with human immunodeficiency virus (HIV)/ sexually transmitted infections (STIs) during 2007, using conventional cluster sampling technique for "hard-to-reach population" with a field-tested questionnaire and the collection of a blood sample for HIV and syphilis serology testing as a composite indicator of STIs. The reported prevalence of nontobacco substance use was 30%; 9% reported having been sexually abused. Some factors (age, lack of contact with family, orphan children, night stay at public place, etc.) were documented to be associated with substance use and sexual abuses. Seroprevalence of HIV was found to be 1% and that of STIs was 4%. This 1% HIV seroprevalence in street children is a matter of concern. Community-based intervention is necessary for them. The study's limitations are noted.

  2. A neural network ActiveX based integrated image processing environment.

    PubMed

    Ciuca, I; Jitaru, E; Alaicescu, M; Moisil, I

    2000-01-01

    The paper outlines an integrated image processing environment that uses neural networks ActiveX technology for object recognition and classification. The image processing environment which is Windows based, encapsulates a Multiple-Document Interface (MDI) and is menu driven. Object (shape) parameter extraction is focused on features that are invariant in terms of translation, rotation and scale transformations. The neural network models that can be incorporated as ActiveX components into the environment allow both clustering and classification of objects from the analysed image. Mapping neural networks perform an input sensitivity analysis on the extracted feature measurements and thus facilitate the removal of irrelevant features and improvements in the degree of generalisation. The program has been used to evaluate the dimensions of the hydrocephalus in a study for calculating the Evans index and the angle of the frontal horns of the ventricular system modifications.

  3. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    PubMed

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  4. Canonical PSO Based K-Means Clustering Approach for Real Datasets

    PubMed Central

    Dey, Lopamudra; Chakraborty, Sanjay

    2014-01-01

    “Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083

  5. Finding gene clusters for a replicated time course study

    PubMed Central

    2014-01-01

    Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656

  6. Light-dependent activation of phosphoenolpyruvate carboxylase by reversible phosphorylation in cluster roots of white lupin plants: diurnal control in response to photosynthate supply

    PubMed Central

    Feil, Regina; Lunn, John E.; Plaxton, William C.

    2016-01-01

    Background and Aims Phosphoenolpyruvate carboxylase (PEPC) is a tightly regulated enzyme that controls carbohydrate partitioning to organic acid anions (malate, citrate) excreted in copious amounts by cluster roots of inorganic phosphate (Pi)-deprived white lupin plants. Excreted malate and citrate solubilize otherwise inaccessible sources of mineralized soil Pi for plant uptake. The aim of this study was to test the hypotheses that (1) PEPC is post-translationally activated by reversible phosphorylation in cluster roots of illuminated white lupin plants, and (2) light-dependent phosphorylation of cluster root PEPC is associated with elevated intracellular levels of sucrose and its signalling metabolite, trehalose-6-phosphate. Methods White lupin plants were cultivated hydroponically at low Pi levels (≤1 µm) and subjected to various light/dark pretreatments. Cluster root PEPC activity and in vivo phosphorylation status were analysed to assess the enzyme’s diurnal, post-translational control in response to light and dark. Levels of various metabolites, including sucrose and trehalose-6-phosphate, were also quantified in cluster root extracts using enzymatic and spectrometric methods. Key Results During the daytime the cluster root PEPC was activated by phosphorylation at its conserved N-terminal seryl residue. Darkness triggered a progressive reduction in PEPC phosphorylation to undetectable levels, and this was correlated with 75–80 % decreases in concentrations of sucrose and trehalose-6- phosphate. Conclusions Reversible, light-dependent regulatory PEPC phosphorylation occurs in cluster roots of Pi-deprived white lupin plants. This likely facilitates the well-documented light- and sucrose-dependent exudation of Pi-solubilizing organic acid anions by the cluster roots. PEPC’s in vivo phosphorylation status appears to be modulated by sucrose translocated from CO2-fixing leaves into the non-photosynthetic cluster roots. PMID:27063365

  7. EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

    PubMed

    Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

    2009-04-01

    In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.

  8. Model-based clustering for RNA-seq data.

    PubMed

    Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

    2014-01-15

    RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org

  9. Coalition Game-Based Secure and Effective Clustering Communication in Vehicular Cyber-Physical System (VCPS)

    PubMed Central

    Huo, Yan; Dong, Wei; Qian, Jin; Jing, Tao

    2017-01-01

    In this paper, we address the low efficiency of cluster-based communication for the crossroad scenario in the Vehicular Cyber-Physical System (VCPS), which is due to the overload of the cluster head resulting from a large number of transmission bandwidth requirements. After formulating the issue as a coalition formation game, a coalition-based clustering strategy is proposed, which could converge into a Nash-stable partition to accomplish the clustering formation process. In the proposed strategy, the coalition utility is formulated by the relative velocity, relative position and the bandwidth availability ratio of vehicles among the cluster. Employing the coalition utility, the vehicles are denoted as the nodes that make the decision whether to switch to a new coalition or stay in the current coalition. Based on this, we can make full use of the bandwidth provided by cluster head under the requirement of clustering stability. Nevertheless, there exist selfish nodes during the clustering formation, so as to intend to benefit from networks. This behavior may degrade the communication quality and even destroy the cluster. Thus, we also present a reputation-based incentive and penalty mechanism to stop the selfish nodes from entering clusters. Numerical simulation results show that our strategy, CG-SECC, takes on a better performance for the tradeoff between the stability and efficiency of clustering communication. Besides, a case study demonstrates that the proposed incentive and penalty mechanism can play an important role in discovering and removing malicious nodes. PMID:28264469

  10. Cluster Inter-Spacecraft Communications

    NASA Technical Reports Server (NTRS)

    Cox, Brian

    2008-01-01

    A document describes a radio communication system being developed for exchanging data and sharing data-processing capabilities among spacecraft flying in formation. The system would establish a high-speed, low-latency, deterministic loop communication path connecting all the spacecraft in a cluster. The system would be a wireless version of a ring bus that complies with the Institute of Electrical and Electronics Engineers (IEEE) standard 1393 (which pertains to a spaceborne fiber-optic data bus enhancement to the IEEE standard developed at NASA's Jet Propulsion Laboratory). Every spacecraft in the cluster would be equipped with a ring-bus radio transceiver. The identity of a spacecraft would be established upon connection into the ring bus, and the spacecraft could be at any location in the ring communication sequence. In the event of failure of a spacecraft, the ring bus would reconfigure itself, bypassing a failed spacecraft. Similarly, the ring bus would reconfigure itself to accommodate a spacecraft newly added to the cluster or newly enabled or re-enabled. Thus, the ring bus would be scalable and robust. Reliability could be increased by launching, into the cluster, spare spacecraft to be activated in the event of failure of other spacecraft.

  11. Banging Galaxy Clusters: High Fidelity X-ray Temperature and Radio Maps to Probe the Physics of Merging Clusters

    NASA Astrophysics Data System (ADS)

    Burns, Jack O.; Hallman, Eric J.; Alden, Brian; Datta, Abhirup; Rapetti, David

    2017-06-01

    We present early results from an X-ray/Radio study of a sample of merging galaxy clusters. Using a novel X-ray pipeline, we have generated high-fidelity temperature maps from existing long-integration Chandra data for a set of clusters including Abell 115, A520, and MACSJ0717.5+3745. Our pipeline, written in python and operating on the NASA ARC high performance supercomputer Pleiades, generates temperature maps with minimal user interaction. This code will be released, with full documentation, on GitHub in beta to the community later this year. We have identified a population of observable shocks in the X-ray data that allow us to characterize the merging activity. In addition, we have compared the X-ray emission and properties to the radio data from observations with the JVLA and GMRT. These merging clusters contain radio relics and/or radio halos in each case. These data products illuminate the merger process, and how the energy of the merger is dissipated into thermal and non-thermal forms. This research was supported by NASA ADAP grant NNX15AE17G.

  12. Lexical frequency and voice assimilation in complex words in Dutch

    NASA Astrophysics Data System (ADS)

    Ernestus, Mirjam; Lahey, Mybeth; Verhees, Femke; Baayen, Harald

    2004-05-01

    Words with higher token frequencies tend to have more reduced acoustic realizations than lower frequency words (e.g., Hay, 2000; Bybee, 2001; Jurafsky et al., 2001). This study documents frequency effects for regressive voice assimilation (obstruents are voiced before voiced plosives) in Dutch morphologically complex words in the subcorpus of read-aloud novels in the corpus of spoken Dutch (Oostdijk et al., 2002). As expected, the initial obstruent of the cluster tends to be absent more often as lexical frequency increases. More importantly, as frequency increases, the duration of vocal-fold vibration in the cluster decreases, and the duration of the bursts in the cluster increases, after partialing out cluster duration. This suggests that there is less voicing for higher-frequency words. In fact, phonetic transcriptions show regressive voice assimilation for only half of the words and progressive voice assimilation for one third. Interestingly, the progressive voice assimilation observed for higher-frequency complex words renders these complex words more similar to monomorphemic words: Dutch monomorphemic words typically contain voiceless obstruent clusters (Zonneveld, 1983). Such high-frequency complex words may therefore be less easily parsed into their constituent morphemes (cf. Hay, 2000), favoring whole word lexical access (Bertram et al., 2000).

  13. Egocentric daily activity recognition via multitask clustering.

    PubMed

    Yan, Yan; Ricci, Elisa; Liu, Gaowen; Sebe, Nicu

    2015-10-01

    Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers' life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods.

  14. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    NASA Astrophysics Data System (ADS)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  15. A mass vaccination campaign targeting adults and children to prevent typhoid fever in Hechi; Expanding the use of Vi polysaccharide vaccine in Southeast China: A cluster-randomized trial

    PubMed Central

    Yang, Jin; Acosta, Camilo J; Si, Guo-ai; Zeng, Jun; Li, Cui-yun; Liang, Da-bin; Ochiai, R Leon; Page, Anne-Laure; Danovaro-Holliday, M Carolina; Zhang, Jie; Zhou, Bao-de; Liao, He-zhuang; Wang, Ming-liu; Tan, Dong-mei; Tang, Zhen-zhu; Gong, Jian; Park, Jin-Kyung; Ali, Mohammad; Ivanoff, Bernard; Liang, Gui-chen; Yang, Hong-hui; Pang, Tikki; Xu, Zhi-yi; Donner, Allan; Galindo, Claudia M; Dong, Bai-qing; Clemens, John D

    2005-01-01

    Background One of the goals of this study was to learn the coverage, safety and logistics of a mass vaccination campaign against typhoid fever in children and adults using locally produced typhoid Vi polysaccharide (PS) and group A meningococcal PS vaccines in southern China. Methods The vaccination campaign targeted 118,588 persons in Hechi, Guangxi Province, aged between 5 to 60 years, in 2003. The study area was divided into 107 geographic clusters, which were randomly allocated to receive one of the single-dose parenteral vaccines. All aspects regarding vaccination logistics, feasibility and safety were documented and systematically recorded. Results of the logistics, feasibility and safety are reported. Results The campaign lasted 5 weeks and the overall vaccination coverage was 78%. On average, the 30 vaccine teams gave immunizations on 23 days. Vaccine rates were higher in those aged ≤ 15 years (90%) than in adolescents and young adults (70%). Planned mop-up activities increased the coverage by 17%. The overall vaccine wastage was 11%. The cold chain was maintained and documented. 66 individuals reported of adverse events out of all vaccinees, where fever (21%), malaise (19%) and local redness (19%) were the major symptoms; no life-threatening event occurred. Three needle-sharp events were reported. Conclusion The mass immunization proved feasible and safe, and vaccine coverage was high. Emphasis should be placed on: injection safety measures, community involvement and incorporation of mop-up strategies into any vaccination campaign. School-based and all-age Vi mass immunizations programs are potentially important public health strategies for prevention of typhoid fever in high-risk populations in southern China. PMID:15904514

  16. MetaABC--an integrated metagenomics platform for data adjustment, binning and clustering.

    PubMed

    Su, Chien-Hao; Hsu, Ming-Tsung; Wang, Tse-Yi; Chiang, Sufeng; Cheng, Jen-Hao; Weng, Francis C; Kao, Cheng-Yan; Wang, Daryi; Tsai, Huai-Kuang

    2011-08-15

    MetaABC is a metagenomic platform that integrates several binning tools coupled with methods for removing artifacts, analyzing unassigned reads and controlling sampling biases. It allows users to arrive at a better interpretation via series of distinct combinations of analysis tools. After execution, MetaABC provides outputs in various visual formats such as tables, pie and bar charts as well as clustering result diagrams. MetaABC source code and documentation are available at http://bits2.iis.sinica.edu.tw/MetaABC/ CONTACT: dywang@gate.sinica.edu.tw; hktsai@iis.sinica.edu.tw Supplementary data are available at Bioinformatics online.

  17. MR-Tandem: parallel X!Tandem using Hadoop MapReduce on Amazon Web Services

    PubMed Central

    Pratt, Brian; Howbert, J. Jeffry; Tasman, Natalie I.; Nilsson, Erik J.

    2012-01-01

    Summary: MR-Tandem adapts the popular X!Tandem peptide search engine to work with Hadoop MapReduce for reliable parallel execution of large searches. MR-Tandem runs on any Hadoop cluster but offers special support for Amazon Web Services for creating inexpensive on-demand Hadoop clusters, enabling search volumes that might not otherwise be feasible with the compute resources a researcher has at hand. MR-Tandem is designed to drop in wherever X!Tandem is already in use and requires no modification to existing X!Tandem parameter files, and only minimal modification to X!Tandem-based workflows. Availability and implementation: MR-Tandem is implemented as a lightly modified X!Tandem C++ executable and a Python script that drives Hadoop clusters including Amazon Web Services (AWS) Elastic Map Reduce (EMR), using the modified X!Tandem program as a Hadoop Streaming mapper and reducer. The modified X!Tandem C++ source code is Artistic licensed, supports pluggable scoring, and is available as part of the Sashimi project at http://sashimi.svn.sourceforge.net/viewvc/sashimi/trunk/trans_proteomic_pipeline/extern/xtandem/. The MR-Tandem Python script is Apache licensed and available as part of the Insilicos Cloud Army project at http://ica.svn.sourceforge.net/viewvc/ica/trunk/mr-tandem/. Full documentation and a windows installer that configures MR-Tandem, Python and all necessary packages are available at this same URL. Contact: brian.pratt@insilicos.com PMID:22072385

  18. GDPC: Gravitation-based Density Peaks Clustering algorithm

    NASA Astrophysics Data System (ADS)

    Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

    2018-07-01

    The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.

  19. Language Policy and Orthographic Harmonization across Linguistic, Ethnic and National Boundaries in Southern Africa

    ERIC Educational Resources Information Center

    Banda, Felix

    2016-01-01

    Drawing on online and daily newspapers, speakers' language and writing practices, official government documents and prescribed spelling systems in Southern Africa, the paper explores the challenges and possibilities of orthographic reforms allowing for mobility across language clusters, ethnicity, regional and national borders. I argue that this…

  20. Development of advanced acreage estimation methods

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr. (Principal Investigator)

    1982-01-01

    The development of an accurate and efficient algorithm for analyzing the structure of MSS data, the application of the Akaiki information criterion to mixture models, and a research plan to delineate some of the technical issues and associated tasks in the area of rice scene radiation characterization are discussed. The AMOEBA clustering algorithm is refined and documented.

  1. Using Drawings of the Brain Cell to Exhibit Expertise in Neuroscience: Exploring the Boundaries of Experimental Culture

    ERIC Educational Resources Information Center

    Hay, David B.; Williams, Darren; Stahl, Daniel; Wingate, Richard J.

    2013-01-01

    This paper explores the research perspective of neuroscience by documenting the brain cell (neuron) drawings of undergraduates, trainee scientists, and leading neuroscience researchers in a single research-intensive university. Qualitative analysis, drawing-sorting exercises, and hierarchical cluster analysis are used to answer two related…

  2. Mississippi Curriculum Framework for Automotive Technology Programs (CIP: 47.0604--Automotive Mechanic/Tech.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the automotive technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  3. The Just Community School: The Theory and the Cambridge Cluster School Experiment.

    ERIC Educational Resources Information Center

    Kohlberg, Lawrence; And Others

    The background, evaluation process, theories, and practical aspects of the Just Community High School in Cambridge, Masachusetts, are presented. The document is organized into four sections. Section 1 briefly discusses the components of a Just School: participatory democracy with teachers and students having equal rights, emphasis on conflict…

  4. Reading Fluency Instruction for Students at Risk for Reading Failure

    ERIC Educational Resources Information Center

    Ring, Jeremiah J.; Barefoot, Lexie C.; Avrit, Karen J.; Brown, Sasha A.; Black, Jeffrey L.

    2013-01-01

    The important role of reading fluency in the comprehension and motivation of readers is well documented. Two reading rate intervention programs were compared in a cluster-randomized clinical trial of students who were considered at-risk for reading failure. One program focused instruction at the word level; the second program focused instruction…

  5. Campylobacter coli Outbreak in Men Who Have Sex with Men, Quebec, Canada, 2010–2011

    PubMed Central

    Helferty, Melissa; Sylvestre, Jean-Loup; Allard, Robert; Pilon, Pierre A.; Poisson, Michel; Bekal, Sadjia

    2013-01-01

    During September 2010–November 2011, a cluster of erythromycin-susceptible, tetracycline- and ciprofloxacin-resistant Campylobacter coli pulsovar 1 infections was documented, involving 10 case-patients, in Montreal, Quebec, Canada. The findings suggested sexual transmission of an enteric infection among men who have sex with men. PMID:23647786

  6. Mississippi Curriculum Framework for Automotive Machinist (Program CIP: 47.0690--Auto Machinist). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the automotive machinist programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  7. "I Felt Like Such a Freshman": First-Year Students Crossing the Library Threshold

    ERIC Educational Resources Information Center

    Dempsey, Paula R.; Jagman, Heather

    2016-01-01

    Qualitative analysis of reflective essays by first-year students in an academic skills course documented outcomes related to the Association of College and Research Libraries Framework for Information Literacy for Higher Education. Student narratives showed how novices encounter the clusters of concepts described in the Framework as…

  8. A genetic graph-based approach for partitional clustering.

    PubMed

    Menéndez, Héctor D; Barrero, David F; Camacho, David

    2014-05-01

    Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.

  9. Conveyor Performance based on Motor DC 12 Volt Eg-530ad-2f using K-Means Clustering

    NASA Astrophysics Data System (ADS)

    Arifin, Zaenal; Artini, Sri DP; Much Ibnu Subroto, Imam

    2017-04-01

    To produce goods in industry, a controlled tool to improve production is required. Separation process has become a part of production process. Separation process is carried out based on certain criteria to get optimum result. By knowing the characteristics performance of a controlled tools in separation process the optimum results is also possible to be obtained. Clustering analysis is popular method for clustering data into smaller segments. Clustering analysis is useful to divide a group of object into a k-group in which the member value of the group is homogeny or similar. Similarity in the group is set based on certain criteria. The work in this paper based on K-Means method to conduct clustering of loading in the performance of a conveyor driven by a dc motor 12 volt eg-530-2f. This technique gives a complete clustering data for a prototype of conveyor driven by dc motor to separate goods in term of height. The parameters involved are voltage, current, time of travelling. These parameters give two clusters namely optimal cluster with center of cluster 10.50 volt, 0.3 Ampere, 10.58 second, and unoptimal cluster with center of cluster 10.88 volt, 0.28 Ampere and 40.43 second.

  10. Text, photo, and line extraction in scanned documents

    NASA Astrophysics Data System (ADS)

    Erkilinc, M. Sezer; Jaber, Mustafa; Saber, Eli; Bauer, Peter; Depalov, Dejan

    2012-07-01

    We propose a page layout analysis algorithm to classify a scanned document into different regions such as text, photo, or strong lines. The proposed scheme consists of five modules. The first module performs several image preprocessing techniques such as image scaling, filtering, color space conversion, and gamma correction to enhance the scanned image quality and reduce the computation time in later stages. Text detection is applied in the second module wherein wavelet transform and run-length encoding are employed to generate and validate text regions, respectively. The third module uses a Markov random field based block-wise segmentation that employs a basis vector projection technique with maximum a posteriori probability optimization to detect photo regions. In the fourth module, methods for edge detection, edge linking, line-segment fitting, and Hough transform are utilized to detect strong edges and lines. In the last module, the resultant text, photo, and edge maps are combined to generate a page layout map using K-Means clustering. The proposed algorithm has been tested on several hundred documents that contain simple and complex page layout structures and contents such as articles, magazines, business cards, dictionaries, and newsletters, and compared against state-of-the-art page-segmentation techniques with benchmark performance. The results indicate that our methodology achieves an average of ˜89% classification accuracy in text, photo, and background regions.

  11. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    NASA Astrophysics Data System (ADS)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  12. K2: A NEW METHOD FOR THE DETECTION OF GALAXY CLUSTERS BASED ON CANADA-FRANCE-HAWAII TELESCOPE LEGACY SURVEY MULTICOLOR IMAGES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thanjavur, Karun; Willis, Jon; Crampton, David, E-mail: karun@uvic.c

    2009-11-20

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg{sup 2} images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulationsmore » show that the false detection rate for these data, at our selected threshold, is only approx1%, and that the cluster catalogs are approx80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z approx 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg{sup 2} are detected, with 1-2 Fornax-like or richer clusters every 2 deg{sup 2}. Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.« less

  13. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  14. Energy Aware Cluster-Based Routing in Flying Ad-Hoc Networks.

    PubMed

    Aadil, Farhan; Raza, Ali; Khan, Muhammad Fahad; Maqsood, Muazzam; Mehmood, Irfan; Rho, Seungmin

    2018-05-03

    Flying ad-hoc networks (FANETs) are a very vibrant research area nowadays. They have many military and civil applications. Limited battery energy and the high mobility of micro unmanned aerial vehicles (UAVs) represent their two main problems, i.e., short flight time and inefficient routing. In this paper, we try to address both of these problems by means of efficient clustering. First, we adjust the transmission power of the UAVs by anticipating their operational requirements. Optimal transmission range will have minimum packet loss ratio (PLR) and better link quality, which ultimately save the energy consumed during communication. Second, we use a variant of the K-Means Density clustering algorithm for selection of cluster heads. Optimal cluster heads enhance the cluster lifetime and reduce the routing overhead. The proposed model outperforms the state of the art artificial intelligence techniques such as Ant Colony Optimization-based clustering algorithm and Grey Wolf Optimization-based clustering algorithm. The performance of the proposed algorithm is evaluated in term of number of clusters, cluster building time, cluster lifetime and energy consumption.

  15. An Energy Centric Cluster-Based Routing Protocol for Wireless Sensor Networks.

    PubMed

    Hosen, A S M Sanwar; Cho, Gi Hwan

    2018-05-11

    Clustering is an effective way to prolong the lifetime of a wireless sensor network (WSN). The common approach is to elect cluster heads to take routing and controlling duty, and to periodically rotate each cluster head's role to distribute energy consumption among nodes. However, a significant amount of energy dissipates due to control messages overhead, which results in a shorter network lifetime. This paper proposes an energy-centric cluster-based routing mechanism in WSNs. To begin with, cluster heads are elected based on the higher ranks of the nodes. The rank is defined by residual energy and average distance from the member nodes. With the role of data aggregation and data forwarding, a cluster head acts as a caretaker for cluster-head election in the next round, where the ranks' information are piggybacked along with the local data sending during intra-cluster communication. This reduces the number of control messages for the cluster-head election as well as the cluster formation in detail. Simulation results show that our proposed protocol saves the energy consumption among nodes and achieves a significant improvement in the network lifetime.

  16. An Energy Centric Cluster-Based Routing Protocol for Wireless Sensor Networks

    PubMed Central

    Hosen, A. S. M. Sanwar; Cho, Gi Hwan

    2018-01-01

    Clustering is an effective way to prolong the lifetime of a wireless sensor network (WSN). The common approach is to elect cluster heads to take routing and controlling duty, and to periodically rotate each cluster head’s role to distribute energy consumption among nodes. However, a significant amount of energy dissipates due to control messages overhead, which results in a shorter network lifetime. This paper proposes an energy-centric cluster-based routing mechanism in WSNs. To begin with, cluster heads are elected based on the higher ranks of the nodes. The rank is defined by residual energy and average distance from the member nodes. With the role of data aggregation and data forwarding, a cluster head acts as a caretaker for cluster-head election in the next round, where the ranks’ information are piggybacked along with the local data sending during intra-cluster communication. This reduces the number of control messages for the cluster-head election as well as the cluster formation in detail. Simulation results show that our proposed protocol saves the energy consumption among nodes and achieves a significant improvement in the network lifetime. PMID:29751663

  17. Moving Object Localization Based on UHF RFID Phase and Laser Clustering

    PubMed Central

    Fu, Yulu; Wang, Changlong; Liang, Gaoli; Zhang, Hua; Ur Rehman, Shafiq

    2018-01-01

    RFID (Radio Frequency Identification) offers a way to identify objects without any contact. However, positioning accuracy is limited since RFID neither provides distance nor bearing information about the tag. This paper proposes a new and innovative approach for the localization of moving object using a particle filter by incorporating RFID phase and laser-based clustering from 2d laser range data. First of all, we calculate phase-based velocity of the moving object based on RFID phase difference. Meanwhile, we separate laser range data into different clusters, and compute the distance-based velocity and moving direction of these clusters. We then compute and analyze the similarity between two velocities, and select K clusters having the best similarity score. We predict the particles according to the velocity and moving direction of laser clusters. Finally, we update the weights of the particles based on K clusters and achieve the localization of moving objects. The feasibility of this approach is validated on a Scitos G5 service robot and the results prove that we have successfully achieved a localization accuracy up to 0.25 m. PMID:29522458

  18. Coalition Game-Based Secure and Effective Clustering Communication in Vehicular Cyber-Physical System (VCPS).

    PubMed

    Huo, Yan; Dong, Wei; Qian, Jin; Jing, Tao

    2017-02-27

    In this paper, we address the low efficiency of cluster-based communication for the crossroad scenario in the Vehicular Cyber-Physical System (VCPS), which is due to the overload of the cluster head resulting from a large number of transmission bandwidth requirements. After formulating the issue as a coalition formation game, a coalition-based clustering strategy is proposed, which could converge into a Nash-stable partition to accomplish the clustering formation process. In the proposed strategy, the coalition utility is formulated by the relative velocity, relative position and the bandwidth availability ratio of vehicles among the cluster. Employing the coalition utility, the vehicles are denoted as the nodes that make the decision whether to switch to a new coalition or stay in the current coalition. Based on this, we can make full use of the bandwidth provided by cluster head under the requirement of clustering stability. Nevertheless, there exist selfish nodes duringtheclusteringformation,soastointendtobenefitfromnetworks. Thisbehaviormaydegrade the communication quality and even destroy the cluster. Thus, we also present a reputation-based incentive and penalty mechanism to stop the selfish nodes from entering clusters. Numerical simulation results show that our strategy, CG-SECC, takes on a better performance for the tradeoff between the stability and efficiency of clustering communication. Besides, a case study demonstrates that the proposed incentive and penalty mechanism can play an important role in discovering and removing malicious nodes.

  19. Managing distance and covariate information with point-based clustering.

    PubMed

    Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul

    2016-09-01

    Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation, are crucial when assessing point-based clustering.

  20. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Clinical interpretation of the Spinal Cord Injury Functional Index (SCI-FI).

    PubMed

    Fyffe, Denise; Kalpakjian, Claire Z; Slavin, Mary; Kisala, Pamela; Ni, Pengsheng; Kirshblum, Steven C; Tulsky, David S; Jette, Alan M

    2016-09-01

    To provide validation of functional ability levels for the Spinal Cord Injury - Functional Index (SCI-FI). Cross-sectional. Inpatient rehabilitation hospital and community settings. A sample of 855 individuals with traumatic spinal cord injury enrolled in 6 rehabilitation centers participating in the National Spinal Cord Injury Model Systems Network. Not Applicable. Spinal Cord Injury-Functional Index (SCI-FI). Cluster analyses identified three distinct groups that represent low, mid-range and high SCI-FI functional ability levels. Comparison of clusters on personal and other injury characteristics suggested some significant differences between groups. These results strongly support the use of SCI-FI functional ability levels to document the perceived functional abilities of persons with SCI. Results of the cluster analysis suggest that the SCI-FI functional ability levels capture function by injury characteristics. Clinical implications regarding tracking functional activity trajectories during follow-up visits are discussed.

  2. Slab tears and intermediate-depth seismicity

    USGS Publications Warehouse

    Meighan, Hallie E.; ten Brink, Uri S.; Pulliam, Jay

    2013-01-01

    Active tectonic regions where plate boundaries transition from subduction to strike slip can take several forms, such as triple junctions, acute, and obtuse corners. Well-documented slab tears that are associated with high rates of intermediate-depth seismicity are considered here: Gibraltar arc, the southern and northern ends of the Lesser Antilles arc, and the northern end of Tonga trench. Seismicity at each of these locations occurs, at times, in the form of swarms or clusters, and various authors have proposed that each marks an active locus of tear propagation. The swarms and clusters start at the top of the slab below the asthenospheric wedge and extend 30–60 km vertically downward within the slab. We propose that these swarms and clusters are generated by fluid-related embrittlement of mantle rocks. Focal mechanisms of these swarms generally fit the shear motion that is thought to be associated with the tearing process.

  3. Linear-array-based photoacoustic tomography for label-free high-throughput detection and quantification of circulating melanoma tumor cell clusters

    NASA Astrophysics Data System (ADS)

    Hai, Pengfei; Zhou, Yong; Zhang, Ruiying; Ma, Jun; Li, Yang; Wang, Lihong V.

    2017-03-01

    Circulating tumor cell (CTC) clusters arise from multicellular grouping in the primary tumor and elevate the metastatic potential by 23 to 50 fold compared to single CTCs. High throughout detection and quantification of CTC clusters is critical for understanding the tumor metastasis process and improving cancer therapy. In this work, we report a linear-array-based photoacoustic tomography (LA-PAT) system capable of label-free high-throughput CTC cluster detection and quantification in vivo. LA-PAT detects CTC clusters and quantifies the number of cells in them based on the contrast-to-noise ratios (CNRs) of photoacoustic signals. The feasibility of LA-PAT was first demonstrated by imaging CTC clusters ex vivo. LA-PAT detected CTC clusters in the blood-filled microtubes and computed the number of cells in the clusters. The size distribution of the CTC clusters measured by LA-PAT agreed well with that obtained by optical microscopy. We demonstrated the ability of LA-PAT to detect and quantify CTC clusters in vivo by imaging injected CTC clusters in rat tail veins. LA-PAT detected CTC clusters immediately after injection as well as when they were circulating in the rat bloodstreams. Similarly, the numbers of cells in the clusters were computed based on the CNRs of the photoacoustic signals. The data showed that larger CTC clusters disappear faster than the smaller ones. The results prove the potential of LA-PAT as a promising tool for both preclinical tumor metastasis studies and clinical cancer therapy evaluation.

  4. Detecting grizzly bear use of ungulate carcasses using global positioning system telemetry and activity data

    USGS Publications Warehouse

    Ebinger, Michael R.; Haroldson, Mark A.; van Manen, Frank T.; Costello, Cecily M.; Bjornlie, Daniel D.; Thompson, Daniel J.; Gunther, Kerry A.; Fortin, Jennifer K.; Teisberg, Justin E.; Pils, Shannon R; White, P J; Cain, Steven L.; Cross, Paul C.

    2016-01-01

    Global positioning system (GPS) wildlife collars have revolutionized wildlife research. Studies of predation by free-ranging carnivores have particularly benefited from the application of location clustering algorithms to determine when and where predation events occur. These studies have changed our understanding of large carnivore behavior, but the gains have concentrated on obligate carnivores. Facultative carnivores, such as grizzly/brown bears (Ursus arctos), exhibit a variety of behaviors that can lead to the formation of GPS clusters. We combined clustering techniques with field site investigations of grizzly bear GPS locations (n = 732 site investigations; 2004–2011) to produce 174 GPS clusters where documented behavior was partitioned into five classes (large-biomass carcass, small-biomass carcass, old carcass, non-carcass activity, and resting). We used multinomial logistic regression to predict the probability of clusters belonging to each class. Two cross-validation methods—leaving out individual clusters, or leaving out individual bears—showed that correct prediction of bear visitation to large-biomass carcasses was 78–88%, whereas the false-positive rate was 18–24%. As a case study, we applied our predictive model to a GPS data set of 266 bear-years in the Greater Yellowstone Ecosystem (2002–2011) and examined trends in carcass visitation during fall hyperphagia (September–October). We identified 1997 spatial GPS clusters, of which 347 were predicted to be large-biomass carcasses. We used the clustered data to develop a carcass visitation index, which varied annually, but more than doubled during the study period. Our study demonstrates the effectiveness and utility of identifying GPS clusters associated with carcass visitation by a facultative carnivore.

  5. Differentiation of human-induced pluripotent stem cells into insulin-producing clusters.

    PubMed

    Shaer, Anahita; Azarpira, Negar; Vahdati, Akbar; Karimi, Mohammad Hosein; Shariati, Mehrdad

    2015-02-01

    In diabetes mellitus type 1, beta cells are mostly destroyed; while in diabetes mellitus type 2, beta cells are reduced by 40% to 60%. We hope that soon, stem cells can be used in diabetes therapy via pancreatic beta cell replacement. Induced pluripotent stem cells are a kind of stem cell taken from an adult somatic cell by "stimulating" certain genes. These induced pluripotent stem cells may be a promising source of cell therapy. This study sought to produce isletlike clusters of insulin-producing cells taken from induced pluripotent stem cells. A human-induced pluripotent stem cell line was induced into isletlike clusters via a 4-step protocol, by adding insulin, transferrin, and selenium (ITS), N2, B27, fibroblast growth factor, and nicotinamide. During differentiation, expression of pancreatic β-cell genes was evaluated by reverse transcriptase-polymerase chain reaction; the morphologic changes of induced pluripotent stem cells toward isletlike clusters were observed by a light microscope. Dithizone staining was used to stain these isletlike clusters. Insulin produced by these clusters was evaluated by radio immunosorbent assay, and the secretion capacity was analyzed with a glucose challenge test. Differentiation was evaluated by analyzing the morphology, dithizone staining, real-time quantitative polymerase chain reaction, and immunocytochemistry. Gene expression of insulin, glucagon, PDX1, NGN3, PAX4, PAX6, NKX6.1, KIR6.2, and GLUT2 were documented by analyzing real-time quantitative polymerase chain reaction. Dithizone-stained cellular clusters were observed after 23 days. The isletlike clusters significantly produced insulin. The isletlike clusters could increase insulin secretion after a glucose challenge test. This work provides a model for studying the differentiation of human-induced pluripotent stem cells to insulin-producing cells.

  6. Detecting grizzly bear use of ungulate carcasses using global positioning system telemetry and activity data.

    PubMed

    Ebinger, Michael R; Haroldson, Mark A; van Manen, Frank T; Costello, Cecily M; Bjornlie, Daniel D; Thompson, Daniel J; Gunther, Kerry A; Fortin, Jennifer K; Teisberg, Justin E; Pils, Shannon R; White, P J; Cain, Steven L; Cross, Paul C

    2016-07-01

    Global positioning system (GPS) wildlife collars have revolutionized wildlife research. Studies of predation by free-ranging carnivores have particularly benefited from the application of location clustering algorithms to determine when and where predation events occur. These studies have changed our understanding of large carnivore behavior, but the gains have concentrated on obligate carnivores. Facultative carnivores, such as grizzly/brown bears (Ursus arctos), exhibit a variety of behaviors that can lead to the formation of GPS clusters. We combined clustering techniques with field site investigations of grizzly bear GPS locations (n = 732 site investigations; 2004-2011) to produce 174 GPS clusters where documented behavior was partitioned into five classes (large-biomass carcass, small-biomass carcass, old carcass, non-carcass activity, and resting). We used multinomial logistic regression to predict the probability of clusters belonging to each class. Two cross-validation methods-leaving out individual clusters, or leaving out individual bears-showed that correct prediction of bear visitation to large-biomass carcasses was 78-88 %, whereas the false-positive rate was 18-24 %. As a case study, we applied our predictive model to a GPS data set of 266 bear-years in the Greater Yellowstone Ecosystem (2002-2011) and examined trends in carcass visitation during fall hyperphagia (September-October). We identified 1997 spatial GPS clusters, of which 347 were predicted to be large-biomass carcasses. We used the clustered data to develop a carcass visitation index, which varied annually, but more than doubled during the study period. Our study demonstrates the effectiveness and utility of identifying GPS clusters associated with carcass visitation by a facultative carnivore.

  7. A report documenting the completion of the Los Alamos National Laboratory portion of the ASC level II milestone ""Visualization on the supercomputing platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahrens, James P; Patchett, John M; Lo, Li - Ta

    2011-01-24

    This report provides documentation for the completion of the Los Alamos portion of the ASC Level II 'Visualization on the Supercomputing Platform' milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratory and Los Alamos National Laboratory. The milestone text is shown in Figure 1 with the Los Alamos portions highlighted in boldfaced text. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is the most computationally intensive portion of the visualization process. Formore » terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the perfromance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. In conclusion, we improved CPU-based rendering performance by a a factor of 2-10 times on our tests. In addition, we evaluated CPU and CPU-based rendering performance. We encourage production visualization experts to consider using CPU-based rendering solutions when it is appropriate. For example, on remote supercomputers CPU-based rendering can offer a means of viewing data without having to offload the data or geometry onto a CPU-based visualization system. In terms of comparative performance of the CPU and CPU we believe that further optimizations of the performance of both CPU or CPU-based rendering are possible. The simulation community is currently confronting this reality as they work to port their simulations to different hardware architectures. What is interesting about CPU rendering of massive datasets is that for part two decades CPU performance has significantly outperformed CPU-based systems. Based on our advancements, evaluations and explorations we believe that CPU-based rendering has returned as one viable option for the visualization of massive datasets.« less

  8. Characteristics of mid-level clouds over West Africa

    NASA Astrophysics Data System (ADS)

    Bourgeois, Elsa; Bouniol, Dominique; Couvreux, Fleur; Guichard, Françoise; Marsham, John; Garcia-Carreras, Luis; Birch, Cathryn; Parker, Doug

    2017-04-01

    Clouds have a major impact on the distribution of water and energy fluxes within the atmosphere. They also represent one of the main sources of uncertainties in global climate models as a result of the difficulty to parametrize cloud processes. However, in West Africa, the cloud type, occurrence and radiative effects have not been extensively documented. This region is characterized by a strong seasonality with precipitation occurring in the Sahel from June to September (monsoon season). This period also coincides with the annual maximum of the cloud cover. Taking advantage of the one-year ARM Mobile Facility (AMF) deployment in 2006 in Niamey (Niger), Bouniol et al (2012) documented the distinct cloud types and showed the frequent occurrence of mid-level clouds (around 6 km height) and their substantial impact on the surface short-wave and long-wave radiative fluxes. Furthermore, in a process-oriented evaluation of climate models, Roehrig et al (2013) showed that these mid-level clouds are poorly represented in numerical models. The aim of this work is to document the macro- and microphysical properties of mid-level clouds and the environment in which such clouds occur across West Africa. To document those clouds, we extensively make use of observations from lidar and cloud radar either deployed at ground-based sites (Niamey and Bordj Badji Mokhtar (Sahara)) or on-board the A-Train constellation (CloudSat/CALIPSO). These datasets reveal the temporal and spatial occurrence of those clouds. They are found throughout the year with a predominance around the monsoon season and are preferentially observed in the Southern and Western part of West Africa which could be linked to the dynamics of the Saharan heat low. Those clouds are usually quite thin (most of them are less than 1000m deep). A clustering method applied to this data allows us to identify three different types of clouds : one with low bases, one with high bases and another with large thicknesses. The first two clouds families are associated with potential temperature inversions at the top of the clouds. Complementary observations such as radiosondes and radiation measurements allow us to determine the thermodynamical stratification in which they occur as well as their radiative properties.

  9. Discovering shared segments on the migration route of the bar-headed goose by time-based plane-sweeping trajectory clustering

    USGS Publications Warehouse

    Luo, Ze; Baoping, Yan; Takekawa, John Y.; Prosser, Diann J.

    2012-01-01

    We propose a new method to help ornithologists and ecologists discover shared segments on the migratory pathway of the bar-headed geese by time-based plane-sweeping trajectory clustering. We present a density-based time parameterized line segment clustering algorithm, which extends traditional comparable clustering algorithms from temporal and spatial dimensions. We present a time-based plane-sweeping trajectory clustering algorithm to reveal the dynamic evolution of spatial-temporal object clusters and discover common motion patterns of bar-headed geese in the process of migration. Experiments are performed on GPS-based satellite telemetry data from bar-headed geese and results demonstrate our algorithms can correctly discover shared segments of the bar-headed geese migratory pathway. We also present findings on the migratory behavior of bar-headed geese determined from this new analytical approach.

  10. Training-based descreening.

    PubMed

    Siddiqui, Hasib; Bouman, Charles A

    2007-03-01

    Conventional halftoning methods employed in electrophotographic printers tend to produce Moiré artifacts when used for printing images scanned from printed material, such as books and magazines. We present a novel approach for descreening color scanned documents aimed at providing an efficient solution to the Moiré problem in practical imaging devices, including copiers and multifunction printers. The algorithm works by combining two nonlinear image-processing techniques, resolution synthesis-based denoising (RSD), and modified smallest univalue segment assimilating nucleus (SUSAN) filtering. The RSD predictor is based on a stochastic image model whose parameters are optimized beforehand in a separate training procedure. Using the optimized parameters, RSD classifies the local window around the current pixel in the scanned image and applies filters optimized for the selected classes. The output of the RSD predictor is treated as a first-order estimate to the descreened image. The modified SUSAN filter uses the output of RSD for performing an edge-preserving smoothing on the raw scanned data and produces the final output of the descreening algorithm. Our method does not require any knowledge of the screening method, such as the screen frequency or dither matrix coefficients, that produced the printed original. The proposed scheme not only suppresses the Moiré artifacts, but, in addition, can be trained with intrinsic sharpening for deblurring scanned documents. Finally, once optimized for a periodic clustered-dot halftoning method, the same algorithm can be used to inverse halftone scanned images containing stochastic error diffusion halftone noise.

  11. Improved Ant Colony Clustering Algorithm and Its Performance Study

    PubMed Central

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  12. Clustering self-organizing maps (SOM) method for human papillomavirus (HPV) DNA as the main cause of cervical cancer disease

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Aldila, D.; Fatimah, Arimbi, M. D.

    2017-07-01

    One of the most widely used clustering method, since it has advantage on its robustness, is Self-Organizing Maps (SOM) method. This paper discusses the application of SOM method on Human Papillomavirus (HPV) DNA which is the main cause of cervical cancer disease, the most dangerous cancer in developing countries. We use 18 types of HPV DNA-based on the newest complete genome. By using open-source-based program R, clustering process can separate 18 types of HPV into two different clusters. There are two types of HPV in the first cluster while 16 others in the second cluster. The analyzing result of 18 types HPV based on the malignancy of the virus (the difficultness to cure). Two of HPV types the first cluster can be classified as tame HPV, while 16 others in the second cluster are classified as vicious HPV.

  13. How do components of evidence-based psychological treatment cluster in practice? A survey and cluster analysis.

    PubMed

    Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M

    2012-01-01

    Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.

  14. Using clustering and a modified classification algorithm for automatic text summarization

    NASA Astrophysics Data System (ADS)

    Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

    2013-01-01

    In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

  15. An improved initialization center k-means clustering algorithm based on distance and density

    NASA Astrophysics Data System (ADS)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  16. Energy Efficient Medium Access Control Protocol for Clustered Wireless Sensor Networks with Adaptive Cross-Layer Scheduling.

    PubMed

    Sefuba, Maria; Walingo, Tom; Takawira, Fambirai

    2015-09-18

    This paper presents an Energy Efficient Medium Access Control (MAC) protocol for clustered wireless sensor networks that aims to improve energy efficiency and delay performance. The proposed protocol employs an adaptive cross-layer intra-cluster scheduling and an inter-cluster relay selection diversity. The scheduling is based on available data packets and remaining energy level of the source node (SN). This helps to minimize idle listening on nodes without data to transmit as well as reducing control packet overhead. The relay selection diversity is carried out between clusters, by the cluster head (CH), and the base station (BS). The diversity helps to improve network reliability and prolong the network lifetime. Relay selection is determined based on the communication distance, the remaining energy and the channel quality indicator (CQI) for the relay cluster head (RCH). An analytical framework for energy consumption and transmission delay for the proposed MAC protocol is presented in this work. The performance of the proposed MAC protocol is evaluated based on transmission delay, energy consumption, and network lifetime. The results obtained indicate that the proposed MAC protocol provides improved performance than traditional cluster based MAC protocols.

  17. Energy Efficient Medium Access Control Protocol for Clustered Wireless Sensor Networks with Adaptive Cross-Layer Scheduling

    PubMed Central

    Sefuba, Maria; Walingo, Tom; Takawira, Fambirai

    2015-01-01

    This paper presents an Energy Efficient Medium Access Control (MAC) protocol for clustered wireless sensor networks that aims to improve energy efficiency and delay performance. The proposed protocol employs an adaptive cross-layer intra-cluster scheduling and an inter-cluster relay selection diversity. The scheduling is based on available data packets and remaining energy level of the source node (SN). This helps to minimize idle listening on nodes without data to transmit as well as reducing control packet overhead. The relay selection diversity is carried out between clusters, by the cluster head (CH), and the base station (BS). The diversity helps to improve network reliability and prolong the network lifetime. Relay selection is determined based on the communication distance, the remaining energy and the channel quality indicator (CQI) for the relay cluster head (RCH). An analytical framework for energy consumption and transmission delay for the proposed MAC protocol is presented in this work. The performance of the proposed MAC protocol is evaluated based on transmission delay, energy consumption, and network lifetime. The results obtained indicate that the proposed MAC protocol provides improved performance than traditional cluster based MAC protocols. PMID:26393608

  18. [First ciguatera outbreak in Germany in 2012].

    PubMed

    Friedemann, Miriam

    2016-12-01

    In November 2012, 23 cases of ciguatera with typical combinations of gastrointestinal and neurological symptoms occurred in Germany after consumption of imported tropical fish (Lutjanus spp.). A questionnaire was used to gather information on the disease course and fish consumption. All patients suffered from pathognomonic cold allodynia. Aside from two severe courses of illness, all other cases showed symptoms of moderate intensity. During a three-year follow-up, seven patients reported prolonged paresthesia for more than one year. Two of them reported further neuropathies over almost three years. This is the first time that long-term persistence of symptoms has been documented in detail. Outbreak cases were allocated to eight clusters in seven German cities. A further cluster was prevented by the successful recall of ciguatoxic fish. Three clusters were confirmed by the detection of ciguatoxin in samples of suspicious and recalled fish. An extrapolation on the basis of ciguatoxic samples revealed twenty prevented cases of ciguatera. Further officially unknown cases should be assumed. During the outbreak investigations, inadvertently falsely labelled fish species and fishing capture areas on import and retail level documents were observed. The ascertainment of cases and the outbreak investigations proved to be difficult due to inconsistent case reports to poisons centers, local health and veterinary authorities. In Germany, many physicians are unaware of the disease pattern of ciguatera and the risks caused by tropical fish. The occurrence of further outbreaks during the following years emphasizes the increasing significance of ciguatera in Germany.

  19. Ectoparasitism by Chigger Mite Larvae (Acari: Trombiculidae) in a Wintering Population of Catharus ustulatus (Turdidae) in Southeastern Peru.

    PubMed

    Servat, Grace Patricia; Cruz, Roxana; Vitorino, Joyce; Deichmann, Jessica

    2018-02-27

    We document chigger mite (Acari: Trombiculidae) ectoparasitic infestation (prevalence and intensity) on a population of Catharus ustulatus (Turdidae) wintering at a site in southeastern Peru undergoing development for natural gas exploration (PAD A). We compare prevalence (i.e., the proportion of individuals infested by chigger mites) and intensity (i.e., the average number of larvae and larvae clusters in infested individuals) at forest edge (< 100 m) and interior (> 100 m) from PAD A, as variation in biotic (e.g., vegetation cover) and abiotic (e.g., relative humidity and temperature) factors are expected to influence chigger mite abundance. Chigger mite prevalence was 100% - all C. ustulatus captured were infested regardless of distance. The range of variation in larvae (2-72 larvae/individual) and cluster intensity (1-4 clusters/individual) did not differ between edge and interior (P > 0.05), despite differences in herbaceous vegetation cover (UM-W = 180, n = 30, 31; P < 0.01). Ectoparasitic prevalence and intensity in long-distance migratory birds might add risks to an already hazardous journey, as ectoparasitic variation and other selective pressures experienced by individuals at each locality may not only be a cause of within-site mortality but by affecting the physical condition of birds it may be carry over to subsequent sites, affecting reproductive success and survival. Documenting ectoparasitism at any phase of the life cycle of migrants could improve understanding of population declines of migratory birds.

  20. Automated Production of Movies on a Cluster of Computers

    NASA Technical Reports Server (NTRS)

    Nail, Jasper; Le, Duong; Nail, William L.; Nail, William

    2008-01-01

    A method of accelerating and facilitating production of video and film motion-picture products, and software and generic designs of computer hardware to implement the method, are undergoing development. The method provides for automation of most of the tedious and repetitive tasks involved in editing and otherwise processing raw digitized imagery into final motion-picture products. The method was conceived to satisfy requirements, in industrial and scientific testing, for rapid processing of multiple streams of simultaneously captured raw video imagery into documentation in the form of edited video imagery and video derived data products for technical review and analysis. In the production of such video technical documentation, unlike in production of motion-picture products for entertainment, (1) it is often necessary to produce multiple video derived data products, (2) there are usually no second chances to repeat acquisition of raw imagery, (3) it is often desired to produce final products within minutes rather than hours, days, or months, and (4) consistency and quality, rather than aesthetics, are the primary criteria for judging the products. In the present method, the workflow has both serial and parallel aspects: processing can begin before all the raw imagery has been acquired, each video stream can be subjected to different stages of processing simultaneously on different computers that may be grouped into one or more cluster(s), and the final product may consist of multiple video streams. Results of processing on different computers are shared, so that workers can collaborate effectively.

  1. Al7CX (X=Li-Cs) clusters: Stability and the prospect for cluster materials

    NASA Astrophysics Data System (ADS)

    Ashman, C.; Khanna, S. N.; Pederson, M. R.; Kortus, J.

    2000-12-01

    Al7C clusters, recently found to have a high-electron affinity and exceptional stability, are shown to form ionic molecules when combined with alkali-metal atoms. Our studies, based on an ab initio gradient-corrected density-functional scheme, show that Al7CX (X=Li-Cs) clusters have a very low-electron affinity and a high-ionization potential. When combined, the two- and four-atom composite clusters of Al7CLi units leave the Al7C clusters almost intact. Preliminary studies indicate that Al7CLi may be suitable to form cluster-based materials.

  2. Structuring communication relationships for interprofessional teamwork (SCRIPT): a cluster randomized controlled trial

    PubMed Central

    Zwarenstein, Merrick; Reeves, Scott; Russell, Ann; Kenaszchuk, Chris; Conn, Lesley Gotlib; Miller, Karen-Lee; Lingard, Lorelei; Thorpe, Kevin E

    2007-01-01

    Background Despite a burgeoning interest in using interprofessional approaches to promote effective collaboration in health care, systematic reviews find scant evidence of benefit. This protocol describes the first cluster randomized controlled trial (RCT) to design and evaluate an intervention intended to improve interprofessional collaborative communication and patient-centred care. Objectives The objective is to evaluate the effects of a four-component, hospital-based staff communication protocol designed to promote collaborative communication between healthcare professionals and enhance patient-centred care. Methods The study is a multi-centre mixed-methods cluster randomized controlled trial involving twenty clinical teaching teams (CTTs) in general internal medicine (GIM) divisions of five Toronto tertiary-care hospitals. CTTs will be randomly assigned either to receive an intervention designed to improve interprofessional collaborative communication, or to continue usual communication practices. Non-participant naturalistic observation, shadowing, and semi-structured, qualitative interviews were conducted to explore existing patterns of interprofessional collaboration in the CTTs, and to support intervention development. Interviews and shadowing will continue during intervention delivery in order to document interactions between the intervention settings and adopters, and changes in interprofessional communication. The primary outcome is the rate of unplanned hospital readmission. Secondary outcomes are length of stay (LOS); adherence to evidence-based prescription drug therapy; patients' satisfaction with care; self-report surveys of CTT staff perceptions of interprofessional collaboration; and frequency of calls to paging devices. Outcomes will be compared on an intention-to-treat basis using adjustment methods appropriate for data from a cluster randomized design. Discussion Pre-intervention qualitative analysis revealed that a substantial amount of interprofessional interaction lacks key core elements of collaborative communication such as self-introduction, description of professional role, and solicitation of other professional perspectives. Incorporating these findings, a four-component intervention was designed with a goal of creating a culture of communication in which the fundamentals of collaboration become a routine part of interprofessional interactions during unstructured work periods on GIM wards. Trial registration Registered with National Institutes of Health as NCT00466297. PMID:17877830

  3. Mechanistic Insight into the Nitrosylation of the [4Fe−4S] Cluster of WhiB-like Proteins

    PubMed Central

    2010-01-01

    The reactivity of protein bound iron−sulfur clusters with nitric oxide (NO) is well documented, but little is known about the actual mechanism of cluster nitrosylation. Here, we report studies of members of the Wbl family of [4Fe−4S] containing proteins, which play key roles in regulating developmental processes in actinomycetes, including Streptomyces and Mycobacteria, and have been shown to be NO responsive. Streptomyces coelicolor WhiD and Mycobacterium tuberculosis WhiB1 react extremely rapidly with NO in a multiphasic reaction involving, remarkably, 8 NO molecules per [4Fe−4S] cluster. The reaction is 104-fold faster than that observed with O2 and is by far the most rapid iron−sulfur cluster nitrosylation reaction reported to date. An overall stoichiometry of [Fe4S4(Cys)4]2− + 8NO → 2[FeI2(NO)4(Cys)2]0 + S2− + 3S0 has been established by determination of the sulfur products and their oxidation states. Kinetic analysis leads to a four-step mechanism that accounts for the observed NO dependence. DFT calculations suggest the possibility that the nitrosylation product is a novel cluster [FeI4(NO)8(Cys)4]0 derived by dimerization of a pair of Roussin’s red ester (RRE) complexes. PMID:21182249

  4. Development of Metal Cluster-Based Energetic Materials at NSWC-IHD

    DTIC Science & Technology

    2011-01-01

    reactivity of NixAly + clusters with nitromethane was investigated using a gas-phase molecular beam system. Results indicate that nitromethane is highly...clusters make up the subunit of a molecular metal-based energetic material. The reactivity of NixAly+ clusters with nitromethane was investigated using...a gas-phase molecular beam system. Results indicate that nitromethane is highly reactive toward the NixAly+ clusters and suggests it would not make

  5. An ensemble framework for clustering protein-protein interaction networks.

    PubMed

    Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan

    2007-07-01

    Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.

  6. Data depth based clustering analysis

    DOE PAGES

    Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

    2016-01-01

    Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less

  7. Synthesis and characterization of transition metal clusters: From the isolation of ligand-stabilized solid fragments to the tuning of magnetic anisotropy and host-guest selectivity, and, Approaches to science teaching: Development of an observation instrument with a measurement model based on item response theory

    NASA Astrophysics Data System (ADS)

    Hee, Allan George

    Part I. The work presented herein describes efforts to develop general techniques for the synthesis of transition metal clusters and the manipulation of their properties. In Chapter 2, it is demonstrated that a modified metal atom reactor allows for the vaporization, passivation, and isolation of metal-chalcogenide clusters from their parent binary solids. Among the clusters produced by this method were Cr6S8(PEt3)6, Fe4S 4(PEt3)4, Co6S8(PEt 3)6, Cu6S4(PEt3)6, Cu12S6(PEt3)8, and Cu26Se 13(PEt3)14. To create single-molecule magnets with higher demagnetization barriers, we are developing metal-cyanide systems which exhibit highly adjustable magnetic behavior. Chapter 3 reports an attempt to introduce magnetic anisotropy into a MnCr6 cluster. Replacement of CrIII with Mo III resulted in the assembly of K[(Me3tacn)6MnMo 6(CN)18](ClO4)3 (Me3tacn = N,N',N″ -trimethyl-1,4,7-triazacyclononane)---the first well-documented example of a cyano-bridged single-molecule magnet. Recently, it was demonstrated that replacing Me3tacn with the less sterically hindering tach (tach = cis,cis-1,3,5-triaminocyclohexane) in the face-centered cubic cluster [(tach)8Cr8Ni 6(CN)24]Br12 provides greater access to the cluster cavity. Chapter 4 describes my efforts to probe the selectivity of this cluster toward inclusion of various guests. Part II. Successful implementation of student-centered curricula reforms requires the creation of a measurement instrument for monitoring whether the curricula are being used as intended. The creation and development of an observation instrument would greatly contribute to this effort. To develop a theoretically sound construct map, it is necessary to review the literature and conduct our own investigations of approaches to science teaching. Chapter 2 presents the findings of these investigations and their contributions to our understanding of the construct. Using these findings, the Science Teaching Observation Protocol (STOP) was created and designed to measure two subconstructs: intentions and strategies. Chapter 3 details the first pilot test of STOP and analysis of the collected data. In Chapter 4, the theoretical shortcomings of the instrument are analyzed and discussed. Modified versions of the intention and strategy subconstruct maps are presented.

  8. Supporting the education evidence portal via text mining

    PubMed Central

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  9. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.

  10. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

    PubMed Central

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-01-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  11. Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

    PubMed

    Xiao, Yongling; Abrahamowicz, Michal

    2010-03-30

    We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

  12. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    PubMed

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  13. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    PubMed

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.

    ERIC Educational Resources Information Center

    Cunningham, J. W.; And Others

    The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…

  15. Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

    PubMed

    Meghani, Salimah H; Knafl, George J

    2017-02-10

    To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in using analgesia for cancer pain. Addressing these concerns, perhaps through real time clinical assessments, may improve patients' analgesic adherence patterns and cancer pain outcomes.

  16. Adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization

    NASA Astrophysics Data System (ADS)

    Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo

    2018-04-01

    Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.

  17. EVIDENCE-BASED PROTOCOLS

    PubMed Central

    Beissner, Katherine L.; Bach, Eileen; Murtaugh, Christopher M.; Trifilio, MaryGrace; Henderson, Charles R.; Barrón, Yolanda; Trachtenberg, Melissa A.; Reid, M. Carrington

    2017-01-01

    Activity-limiting pain is common among older home care patients and pain management is complicated by the high prevalence of physical frailty and multimorbidity in the home care population. A comparative effectiveness study was undertaken at a large urban home care agency to examine an evidence-based pain self-management program delivered by physical therapists (PTs). This article focuses on PT training, methods implemented to reinforce content after training and to encourage uptake of the program with appropriate patients, and therapists’ fidelity to the program. Seventeen physical therapy teams were included in the cluster randomized controlled trial, with 8 teams (155 PTs) assigned to a control and 9 teams (165 PTs) assigned to a treatment arm. Treatment therapists received interactive training over two sessions, with a follow-up session 6 months later. Additional support was provided via emails, e-learning materials including videos, and a therapist manual. Program fidelity was assessed by examining PT pain documentation in the agency’s electronic health record. PT feedback on the program was obtained via semistructured surveys. There were no between-group differences in the number of PTs documenting program elements with the exception of instruction in the use of imagery, which was documented by a higher percentage of intervention therapists (p = 0.002). PTs felt comfortable teaching the program elements, but cited time as the biggest barrier to implementing the protocol. Possible explanations for study results suggesting limited adherence to the program protocol by intervention-group PTs include the top-down implementation strategy, competing organizational priorities, program complexity, competing patient priorities, and inadequate patient buy-in. Implications for the implementation of complex new programs in the home healthcare setting are discussed. PMID:28157776

  18. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  19. SAR image segmentation using skeleton-based fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Cao, Yun Yi; Chen, Yan Qiu

    2003-06-01

    SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.

  20. An Association between Bullying Behaviors and Alcohol Use among Middle School Students

    ERIC Educational Resources Information Center

    Peleg-Oren, Neta; Cardenas, Gabriel A.; Comerford, Mary; Galea, Sandro

    2012-01-01

    Although a high prevalence of bullying behaviors among adolescents has been documented, little is known about the association between bullying behaviors and alcohol use among perpetrators or victims. This study used data from a representative two-stage cluster random sample of 44, 532 middle school adolescents in Florida. We found a high…

  1. Mississippi Curriculum Framework for Diesel Equipment Technology (CIP: 47.0605--Diesel Engine Mechanic & Repairer). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the diesel equipment technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies,…

  2. Mississippi Curriculum Framework for Collision Repair Technology (Program CIP: 47.0603--Auto/Autobody Repair). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the collision repair technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequences for 1- and 2-year certificates. Section…

  3. Mississippi Curriculum Framework for Forestry Technology (Program CIP: 03.0401--Forest Harvesting and Production Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the forestry technology program cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the…

  4. Career Exploration, Level 1. Career-Centered Curriculum for Vocational Complexes in Mississippi.

    ERIC Educational Resources Information Center

    Mississippi State Dept. of Education, Jackson. Div. of Vocational and Technical Education.

    Spanning grades 7 and 8, the level 1 document focuses on the broad exploration of careers and introduces the student to the world of work through simulated laboratory and real life experiences. Career clusters are reviewed, encouraging exploration of self in relation to academic and vocational education. Students are rotated through six six-week…

  5. Mississippi Curriculum Framework for Welding and Cutting Programs (Program CIP: 48.0508--Welder/Welding Technologist). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the welding and cutting programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  6. Semantic Fluency in Aphasia: Clustering and Switching in the Course of 1 Minute

    ERIC Educational Resources Information Center

    Bose, Arpita; Wood, Rosalind; Kiran, Swathi

    2017-01-01

    Background: Verbal fluency tasks are included in a broad range of aphasia assessments. It is well documented that people with aphasia (PWA) produce fewer items in these tasks. Successful performance on verbal fluency relies on the integrity of both linguistic and executive control abilities. It remains unclear if limited output in aphasia is…

  7. Mississippi Curriculum Framework for Plumber and Pipefitter/Steamfitter (Program CIP: 46.0501--Plumber and Pipefitter). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the plumber and pipefitter/steamfitter cluster. Presented in the introductory section are program descriptions and suggested course sequences for the plumbing and pipefitting programs. Section…

  8. Task Lists for Business, Marketing and Management Occupations, 1988: Cluster Matrices for Business, Marketing and Management Occupations. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Fonseca, Linda Lafferty

    Developed in Illinois, this document contains three components. The first component consists of employability task lists for the business, marketing, and management occupations of first-line supervisors and manager/supervisors; file clerks; traffic, shipping, and receiving clerks; records management analysts; adjustment clerks; and customer…

  9. Management Information Systems for Vocational Education: A National Overview. Technical Report No. 1.

    ERIC Educational Resources Information Center

    Morgan, Robert L., Ed.; And Others

    The document contains 12 papers. Two of the papers present opening and closing remarks to the conference. The other 10 deal with their State's management information system (MIS) in vocational education. The 10 papers are clustered according to whether they are primarily descriptive of student accounting (four papers), manpower supply and demand…

  10. Seed and soil dynamics in shrubland ecosystems: proceedings; 2002 August 12-16; Laramie, WY

    Treesearch

    Ann L. Hild; Nancy L. Shaw; Susan E. Meyer; D. Terrance Booth; E. Durant McArthur

    2004-01-01

    The 38 papers in this proceedings are divided into six sections; the first includes an overview paper and documentation of the first Shrub Research Consortium Distinguished Service Award. The next four sections cluster papers on restoration and revegetation, soil and microsite requirements, germination and establishment of desired species, and community ecology of...

  11. Task Lists for Home Economics Occupations, 1988: Cluster Matrices for Home Economics Occupations. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Below, Virginia

    This document contains information for home economics occupations in Illinois in seven sections. The first part provides lists of employability skills for the following: food preparation and service worker, fashion designer, dietetic technician, and service coordinator/consumer assistant/concierge. The second section contains task analyses for the…

  12. Mississippi Curriculum Framework for Civil Technology (Program CIP: 15.0201--Civil Engineering/Civil Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the civil technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and section…

  13. Converging on Choice: The Interstate Flow of Foundation Dollars to Charter School Organizations

    ERIC Educational Resources Information Center

    Ferrare, Joseph J.; Setari, R. Renee

    2018-01-01

    A growing body of research has been documenting the pivotal role that philanthropic funding plays in advancing state and local charter school reform. However, there is little understanding of the geographic flow of these funding patterns and the market, policy, and organizational conditions that have concentrated funding in some clusters of states…

  14. Perceptions of Co-Teaching: Closing the Achievement Gap between English Language Learners and Their English Monolingual Peers

    ERIC Educational Resources Information Center

    Ford-DeWaters, Carrie

    2017-01-01

    This qualitative exploratory single case research study used observations, semi-structured interviews, and document analysis to explore co-teachers' perceptions of the implementation of a co-teaching instructional model in elementary school general education classrooms with clusters of English learners (EL) in attendance. A total of four…

  15. MEPD: a Medaka gene expression pattern database

    PubMed Central

    Henrich, Thorsten; Ramialison, Mirana; Quiring, Rebecca; Wittbrodt, Beate; Furutani-Seiki, Makoto; Wittbrodt, Joachim; Kondoh, Hisato

    2003-01-01

    The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters such as staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through BLAST. ESTs in the database are clustered upon entry and have been blasted against public data-bases. The BLAST results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD. PMID:12519950

  16. Familial Paraphilia: A Pilot Study with the Construction of Genograms

    PubMed Central

    Labelle, Alain; Bourget, Dominique; Bradford, John M. W.; Alda, Martin; Tessier, Pierre

    2012-01-01

    Biological factors are likely predisposing and modulating elements in sexually deviant behavior. The observation that paraphilic behavior tends to cluster in some families is intriguing and potentially raises questions as to whether shared genetic factors may play a role in the transmission of paraphilia. This pilot study introduces five families in which we found presence of paraphilia over generations. We constructed genograms on the basis of a standardized family history. Results document the aggregation of sexual deviations within the sample of families and support a clinical/phenomenological heterogeneity of sexual deviation. The concept of paraphilia in relation to phenotypic expressions and the likelihood of a spectrum of related disorders must be clarified before conclusions can be reached as to family aggregation of paraphilia based on biological factors. PMID:23738209

  17. Fault-tolerant measurement-based quantum computing with continuous-variable cluster states.

    PubMed

    Menicucci, Nicolas C

    2014-03-28

    A long-standing open question about Gaussian continuous-variable cluster states is whether they enable fault-tolerant measurement-based quantum computation. The answer is yes. Initial squeezing in the cluster above a threshold value of 20.5 dB ensures that errors from finite squeezing acting on encoded qubits are below the fault-tolerance threshold of known qubit-based error-correcting codes. By concatenating with one of these codes and using ancilla-based error correction, fault-tolerant measurement-based quantum computation of theoretically indefinite length is possible with finitely squeezed cluster states.

  18. Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

    PubMed

    Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

    2017-09-01

    Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Single cyanide-bridged Mo(W)/S/Cu cluster-based coordination polymers: Reactant- and stoichiometry-dependent syntheses, effective photocatalytic properties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jinfang, E-mail: zjf260@jiangnan.edu.cn; Wang, Chao; Wang, Yinlin

    2015-11-15

    The systematic study on the reaction variables affecting single cyanide-bridged Mo(W)/S/Cu cluster-based coordination polymers (CPs) is firstly demonstrated. Five anionic single cyanide-bridged Mo(W)/S/Cu cluster-based CPs {[Pr_4N][WS_4Cu_3(CN)_2]}{sub n} (1), {[Pr_4N][WS_4Cu_4(CN)_3]}{sub n} (2), {[Pr_4N][WOS_3Cu_3(CN)_2]}{sub n} (3), {[Bu_4N][WOS_3Cu_3(CN)_2]}{sub n} (4) and {[Bu_4N][MoOS_3Cu_3(CN)_2]}{sub n} (5) were prepared by varying the molar ratios of the starting materials, and the specific cations, cluster building blocks and central metal atoms in the cluster building blocks. 1 possesses an anionic 3D diamondoid framework constructed from 4-connected T-shaped clusters [WS{sub 4}Cu{sub 3}]{sup +} and single CN{sup −} bridges. 2 is fabricated from 6-connected planar ‘open’ clusters [WS{sub 4}Cu{sub 4}]{supmore » 2+} and single CN{sup −} bridges, forming an anionic 3D architecture with an “ACS” topology. 3 and 4 exhibit novel anionic 2-D double-layer networks, both constructed from nest-shaped clusters [WOS{sub 3}Cu{sub 3}]{sup +} linked by single CN{sup −} bridges, but containing the different cations [Pr{sub 4}N]{sup +} and [Bu{sub 4}N]{sup +}, respectively. 5 is constructed from nest-shaped clusters [MoOS{sub 3}Cu{sub 3}]{sup +} and single CN{sup −} bridges, with an anionic 3D diamondoid framework. The anionic frameworks of 1-5, all sustained by single CN{sup −} bridges, are non-interpenetrating and exhibit huge potential void volumes. Employing differing molar ratios of the reactants and varying the cluster building blocks resulted in differing single cyanide-bridged Mo(W)/S/Cu cluster-based CPs, while replacing the cation ([Pr{sub 4}N]{sup +} vs. [Bu{sub 4}N]{sup +}) was found to have negligible impact on the nature of the architecture. Unexpectedly, replacement of the central metal atom (W vs. Mo) in the cluster building blocks had a pronounced effect on the framework. Furthermore, the photocatalytic activities of heterothiometallic cluster-based CPs were firstly explored by monitoring the photodegradation of methylene blue (MB) under visible light irradiation, which reveals that 2 exhibits effective photocatalytic properties. - Highlights: • Reaction variables affecting Mo(W)/S/Cu cluster-based CPs is firstly explored. • Replacing central metal atom had a pronounced effect on W/S/Cu cluster-based CPs. • Photocatalytic activities of Mo(W)/S/Cu cluster-based CPs are firstly investigated.« less

  20. Marine Planning and Service Platform: specific ontology based semantic search engine serving data management and sustainable development

    NASA Astrophysics Data System (ADS)

    Manzella, Giuseppe M. R.; Bartolini, Andrea; Bustaffa, Franco; D'Angelo, Paolo; De Mattei, Maurizio; Frontini, Francesca; Maltese, Maurizio; Medone, Daniele; Monachini, Monica; Novellino, Antonio; Spada, Andrea

    2016-04-01

    The MAPS (Marine Planning and Service Platform) project is aiming at building a computer platform supporting a Marine Information and Knowledge System. One of the main objective of the project is to develop a repository that should gather, classify and structure marine scientific literature and data thus guaranteeing their accessibility to researchers and institutions by means of standard protocols. In oceanography the cost related to data collection is very high and the new paradigm is based on the concept to collect once and re-use many times (for re-analysis, marine environment assessment, studies on trends, etc). This concept requires the access to quality controlled data and to information that is provided in reports (grey literature) and/or in relevant scientific literature. Hence, creation of new technology is needed by integrating several disciplines such as data management, information systems, knowledge management. In one of the most important EC projects on data management, namely SeaDataNet (www.seadatanet.org), an initial example of knowledge management is provided through the Common Data Index, that is providing links to data and (eventually) to papers. There are efforts to develop search engines to find author's contributions to scientific literature or publications. This implies the use of persistent identifiers (such as DOI), as is done in ORCID. However very few efforts are dedicated to link publications to the data cited or used or that can be of importance for the published studies. This is the objective of MAPS. Full-text technologies are often unsuccessful since they assume the presence of specific keywords in the text; in order to fix this problem, the MAPS project suggests to use different semantic technologies for retrieving the text and data and thus getting much more complying results. The main parts of our design of the search engine are: • Syntactic parser - This module is responsible for the extraction of "rich words" from the text: the whole document gets parsed to extract the words which are more meaningful for the main argument of the document, and applies the extraction in the form of N-grams (mono-grams, bi-grams, tri-grams). • MAPS database - This module is a simple database which contains all the N-grams used by MAPS (physical parameters from SeaDataNet vocabularies) to define our marine "ontology". • Relation identifier - This module performs the most important task of identifying relationships between the N-gram extracted from the text by the parser and the provided oceanographic terminology. It checks N-grams supplied by the Syntactic parser and then matches them with the terms stored in the MAPS database. Found matches are returned back to the parser with flexed form appearing in the source text. • A "relaxed" extractor - This option can be activated when the search engine is launched. It was introduced to give the user a chance to create new N-grams combining existing mono-grams and bi-grams in the database with rich-words found within the source text. The innovation of a semantic engine lies in the fact that the process is not just about the retrieval of already known documents by means of a simple term query but rather the retrieval of a population of documents whose existence was unknown. The system answers by showing a screenshot of results ordered according to the following criteria: • Relevance - of the document with respect to the concept that is searched • Date - of publication of the paper • Source - data provider as defined in the SeaDataNet Common Data Index • Matrix - environmental matrices as defined in the oceanographic field • Geographic area - area specified in the text • Clustering - the process of organizing objects into groups whose members are similar The clustering returns as the output the related documents. For each document the MAPS visualization provides: • Title, author, source/provider of data, web address • Tagging of key terms or concepts • Summary of the document • Visualization of the whole document The possibility of inserting the number of citations for each document among the criteria of the advanced search is currently undergoing; in this case the engine should be able to connect to any of the existing bibliographic citation systems (such as Google Scholar, Scopus, etc.).

  1. Pedoinformatics Approach to Soil Text Analytics

    NASA Astrophysics Data System (ADS)

    Furey, J.; Seiter, J.; Davis, A.

    2017-12-01

    The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.

  2. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  3. Serial Clustering of North Atlantic Cyclones and Wind Storms: A New Identification Base and Sensitivity to Intensity and Intra-Seasonal Variability

    NASA Astrophysics Data System (ADS)

    Leckebusch, G. C.; Kirchner-Bossi, N. O.; Befort, D. J.; Ulbrich, U.

    2015-12-01

    Time-clustered mid-latitude winter storms are responsible for a large portion of the overall windstorm-related damage in Europe. Thus, its study entails a high meteorological interest, while its outcome can result in a crucial utility for the (re)insurance industry. In addition to existing cyclone-based studies, here we use an event identification approach based on surface near wind speeds only, to investigate windstorm clustering and compare it to cyclone clustering. Specifically, cyclone and windstorm tracks are identified for winter 1979-2013 (Oct-Mar), to perform two sensitivity analyses on event-clustering in the North Atlantic using ERA-Interim Reanalysis. First, the link between clustering and cyclone intensity is analysed and compared to windstorms. Secondly, the sensitivity of clustering on intra-seasonal time scales is investigated, for both cyclones and windstorms. The wind-based approach reveals additional regions of clustering over Western Europe, which could be related to extreme damages, showing the added value of investigating wind field derived tracks in addition to that of cyclone tracks. Previous studies indicate a higher degree of clustering for stronger cyclones. However, our results show that this assumption is not always met. Although a positive relationship is confirmed for the clustering centre located over Iceland, clustering off the coast of the Iberian Peninsula behaves opposite. Even though this region shows the highest clustering, most of its signal is due to cyclones with intensities below the 70th percentile of the Laplacian of MSLP. Results on the sensitivity of clustering to the time of the winter season (Oct-Mar) show a temporal evolution of the clustering patterns, for both windstorms and cyclones. Compared to all cyclones, clustering of windstorms and strongest cyclones culminate around February, while all cyclone clustering peak in December to January.

  4. U-series vs 14C ages of deep-sea corals from the southern Labrador Sea: Sporadic development of corals and geochemical processes hampering estimation of ambient water ventilation ages

    NASA Astrophysics Data System (ADS)

    Hillaire-Marcel, Claude; Maccali, Jenny; Ménabréaz, Lucie; Ghaleb, Bassam; Blénet, Aurélien; Edinger, Evan

    2017-04-01

    Deep-sea scleractinian corals were collected with the remotely operated ROPOS vehicle off Newfounland. Fossil specimens of Desmophyllum dianthus were raised from coral graveyards at Orphan Knoll (˜1700m depth) and Flemish cap (˜2200 m depth), while live specimens were collected directly in overlying steep rock slopes. D. dianthus has an aragonitic skeleton and is thus particularly suited for U-Th dating. We obtained > 70 U-series ages along with > 20 14C measurements. Results display a discrete age distribution with two age clusters: a Bølling-Allerød and Holocene cluster with > 20 samples, and a Marine Isotope Stage (MIS) 5c cluster with ˜50 samples. Only two samples lay outside these clusters, at ˜ 64 ka and at ˜181 ka. Contrary to the New England seamounts where coral presence seems to have been continue through the last 70 ka, Orphan Knoll and Flemish Cap graveyards are marked by the absence of preserved specimens from MIS 2 to MIS 4 and throughout MIS 6. For filter-feeding deep-sea corals, access to food-rich waters is essential. Hence the Holocene and MIS 5 clusters observed in the Labrador basin might represent intervals linked to high food availability, either through production in the overlying water column, more effectively in relation to particulate and dissolved organic carbon transport via an active Western Boundary Undercurrent. Comparison of 230Th-ages vs 14C-ages in order to document changes in ventilation ages of the ambient water masses is equivocal due to the presence of some diagenetic and/or initial 230Th-excess. In addition, discrete diagenetic U-fluxes can be documented from 234U/238U vs 230Th/238U data. They point to a recent winnowing of sediment overlying the fossil corals that we link to the Holocene intensification of the Western Boundary Undercurrent, which resulted in driving Fe-Mn coatings.

  5. HIFLUGCS: X-ray luminosity-dynamical mass relation and its implications for mass calibrations with the SPIDERS and 4MOST surveys

    NASA Astrophysics Data System (ADS)

    Zhang, Yu-Ying; Reiprich, Thomas H.; Schneider, Peter; Clerc, Nicolas; Merloni, Andrea; Schwope, Axel; Borm, Katharina; Andernach, Heinz; Caretta, César A.; Wu, Xiang-Ping

    2017-03-01

    We present the relation of X-ray luminosity versus dynamical mass for 63 nearby clusters of galaxies in a flux-limited sample, the HIghest X-ray FLUx Galaxy Cluster Sample (HIFLUGCS, consisting of 64 clusters). The luminosity measurements are obtained based on 1.3 Ms of clean XMM-Newton data and ROSAT pointed observations. The masses are estimated using optical spectroscopic redshifts of 13647 cluster galaxies in total. We classify clusters into disturbed and undisturbed based on a combination of the X-ray luminosity concentration and the offset between the brightest cluster galaxy and X-ray flux-weighted center. Given sufficient numbers (I.e., ≥45) of member galaxies when the dynamical masses are computed, the luminosity versus mass relations agree between the disturbed and undisturbed clusters. The cool-core clusters still dominate the scatter in the luminosity versus mass relation even when a core-corrected X-ray luminosity is used, which indicates that the scatter of this scaling relation mainly reflects the structure formation history of the clusters. As shown by the clusters with only few spectroscopically confirmed members, the dynamical masses can be underestimated and thus lead to a biased scaling relation. To investigate the potential of spectroscopic surveys to follow up high-redshift galaxy clusters or groups observed in X-ray surveys for the identifications and mass calibrations, we carried out Monte Carlo resampling of the cluster galaxy redshifts and calibrated the uncertainties of the redshift and dynamical mass estimates when only reduced numbers of galaxy redshifts per cluster are available. The resampling considers the SPIDERS and 4MOST configurations, designed for the follow-up of the eROSITA clusters, and was carried out for each cluster in the sample at the actual cluster redshift as well as at the assigned input cluster redshifts of 0.2, 0.4, 0.6, and 0.8. To follow up very distant clusters or groups, we also carried out the mass calibration based on the resampling with only ten redshifts per cluster, and redshift calibration based on the resampling with only five and ten redshifts per cluster, respectively. Our results demonstrate the power of combining upcoming X-ray and optical spectroscopic surveys for mass calibration of clusters. The scatter in the dynamical mass estimates for the clusters with at least ten members is within 50%.

  6. Spatial clustering of pixels of a multispectral image

    DOEpatents

    Conger, James Lynn

    2014-08-19

    A method and system for clustering the pixels of a multispectral image is provided. A clustering system computes a maximum spectral similarity score for each pixel that indicates the similarity between that pixel and the most similar neighboring. To determine the maximum similarity score for a pixel, the clustering system generates a similarity score between that pixel and each of its neighboring pixels and then selects the similarity score that represents the highest similarity as the maximum similarity score. The clustering system may apply a filtering criterion based on the maximum similarity score so that pixels with similarity scores below a minimum threshold are not clustered. The clustering system changes the current pixel values of the pixels in a cluster based on an averaging of the original pixel values of the pixels in the cluster.

  7. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  8. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE PAGES

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    2018-03-19

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  9. Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bogen, Paul Logasa; Symons, Christopher T; McKenzie, Amber T

    2013-01-01

    In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing setsmore » of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.« less

  10. Use of Spatial Epidemiology and Hot Spot Analysis to Target Women Eligible for Prenatal Women, Infants, and Children Services

    PubMed Central

    Krawczyk, Christopher; Gradziel, Pat; Geraghty, Estella M.

    2014-01-01

    Objectives. We used a geographic information system and cluster analyses to determine locations in need of enhanced Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) Program services. Methods. We linked documented births in the 2010 California Birth Statistical Master File with the 2010 data from the WIC Integrated Statewide Information System. Analyses focused on the density of pregnant women who were eligible for but not receiving WIC services in California’s 7049 census tracts. We used incremental spatial autocorrelation and hot spot analyses to identify clusters of WIC-eligible nonparticipants. Results. We detected clusters of census tracts with higher-than-expected densities, compared with the state mean density of WIC-eligible nonparticipants, in 21 of 58 (36.2%) California counties (P < .05). In subsequent county-level analyses, we located neighborhood-level clusters of higher-than-expected densities of eligible nonparticipants in Sacramento, San Francisco, Fresno, and Los Angeles Counties (P < .05). Conclusions. Hot spot analyses provided a rigorous and objective approach to determine the locations of statistically significant clusters of WIC-eligible nonparticipants. Results helped inform WIC program and funding decisions, including the opening of new WIC centers, and offered a novel approach for targeting public health services. PMID:24354821

  11. Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques

    ERIC Educational Resources Information Center

    Luan, Jing

    2004-01-01

    This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…

  12. Soft learning vector quantization and clustering algorithms based on ordered weighted aggregation operators.

    PubMed

    Karayiannis, N B

    2000-01-01

    This paper presents the development and investigates the properties of ordered weighted learning vector quantization (LVQ) and clustering algorithms. These algorithms are developed by using gradient descent to minimize reformulation functions based on aggregation operators. An axiomatic approach provides conditions for selecting aggregation operators that lead to admissible reformulation functions. Minimization of admissible reformulation functions based on ordered weighted aggregation operators produces a family of soft LVQ and clustering algorithms, which includes fuzzy LVQ and clustering algorithms as special cases. The proposed LVQ and clustering algorithms are used to perform segmentation of magnetic resonance (MR) images of the brain. The diagnostic value of the segmented MR images provides the basis for evaluating a variety of ordered weighted LVQ and clustering algorithms.

  13. A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

    PubMed Central

    Wang, Zhihao; Yi, Jing

    2016-01-01

    For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291

  14. SLURM: Simple Linux Utility for Resource Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M; Dunlap, C; Garlick, J

    2002-04-24

    Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, and scheduling modules. The design also includes a scalable, general-purpose communication infrastructure. Development will take place in four phases: Phase I results in a solid infrastructure; Phase II produces a functional but limited interactive job initiation capability without use of the interconnect/switch; Phase III provides switch support and documentation; Phase IV provides job status, fault-tolerance, and job queuing and control through Livermore's Distributed Productionmore » Control System (DPCS), a meta-batch and resource management system.« less

  15. Human wound photogrammetry with low-cost hardware based on automatic calibration of geometry and color

    NASA Astrophysics Data System (ADS)

    Jose, Abin; Haak, Daniel; Jonas, Stephan; Brandenburg, Vincent; Deserno, Thomas M.

    2015-03-01

    Photographic documentation and image-based wound assessment is frequently performed in medical diagnostics, patient care, and clinical research. To support quantitative assessment, photographic imaging is based on expensive and high-quality hardware and still needs appropriate registration and calibration. Using inexpensive consumer hardware such as smartphone-integrated cameras, calibration of geometry, color, and contrast is challenging. Some methods involve color calibration using a reference pattern such as a standard color card, which is located manually in the photographs. In this paper, we adopt the lattice detection algorithm by Park et al. from real world to medicine. At first, the algorithm extracts and clusters feature points according to their local intensity patterns. Groups of similar points are fed into a selection process, which tests for suitability as a lattice grid. The group which describes the largest probability of the meshes of a lattice is selected and from it a template for an initial lattice cell is extracted. Then, a Markov random field is modeled. Using the mean-shift belief propagation, the detection of the 2D lattice is solved iteratively as a spatial tracking problem. Least-squares geometric calibration of projective distortions and non-linear color calibration in RGB space is supported by 35 corner points of 24 color patches, respectively. The method is tested on 37 photographs taken from the German Calciphylaxis registry, where non-standardized photographic documentation is collected nationwide from all contributing trial sites. In all images, the reference card location is correctly identified. At least, 28 out of 35 lattice points were detected, outperforming the SIFT-based approach previously applied. Based on these coordinates, robust geometry and color registration is performed making the photographs comparable for quantitative analysis.

  16. The coupling of fluids, dynamics, and controls on advanced architecture computers

    NASA Technical Reports Server (NTRS)

    Atwood, Christopher

    1995-01-01

    This grant provided for the demonstration of coupled controls, body dynamics, and fluids computations in a workstation cluster environment; and an investigation of the impact of peer-peer communication on flow solver performance and robustness. The findings of these investigations were documented in the conference articles.The attached publication, 'Towards Distributed Fluids/Controls Simulations', documents the solution and scaling of the coupled Navier-Stokes, Euler rigid-body dynamics, and state feedback control equations for a two-dimensional canard-wing. The poor scaling shown was due to serialized grid connectivity computation and Ethernet bandwidth limits. The scaling of a peer-to-peer communication flow code on an IBM SP-2 was also shown. The scaling of the code on the switched fabric-linked nodes was good, with a 2.4 percent loss due to communication of intergrid boundary point information. The code performance on 30 worker nodes was 1.7 (mu)s/point/iteration, or a factor of three over a Cray C-90 head. The attached paper, 'Nonlinear Fluid Computations in a Distributed Environment', documents the effect of several computational rate enhancing methods on convergence. For the cases shown, the highest throughput was achieved using boundary updates at each step, with the manager process performing communication tasks only. Constrained domain decomposition of the implicit fluid equations did not degrade the convergence rate or final solution. The scaling of a coupled body/fluid dynamics problem on an Ethernet-linked cluster was also shown.

  17. Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    PubMed Central

    Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257

  18. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  19. Mixed Pattern Matching-Based Traffic Abnormal Behavior Recognition

    PubMed Central

    Cui, Zhiming; Zhao, Pengpeng

    2014-01-01

    A motion trajectory is an intuitive representation form in time-space domain for a micromotion behavior of moving target. Trajectory analysis is an important approach to recognize abnormal behaviors of moving targets. Against the complexity of vehicle trajectories, this paper first proposed a trajectory pattern learning method based on dynamic time warping (DTW) and spectral clustering. It introduced the DTW distance to measure the distances between vehicle trajectories and determined the number of clusters automatically by a spectral clustering algorithm based on the distance matrix. Then, it clusters sample data points into different clusters. After the spatial patterns and direction patterns learned from the clusters, a recognition method for detecting vehicle abnormal behaviors based on mixed pattern matching was proposed. The experimental results show that the proposed technical scheme can recognize main types of traffic abnormal behaviors effectively and has good robustness. The real-world application verified its feasibility and the validity. PMID:24605045

  20. An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy

    PubMed Central

    Ramanujam, Nedunchelian; Kaliappan, Manivannan

    2016-01-01

    Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multidocument text summarization. The timestamp provides the summary an ordered look, which achieves the coherent looking summary. It extracts the more relevant information from the multiple documents. Here, scoring strategy is also used to calculate the score for the words to obtain the word frequency. The higher linguistic quality is estimated in terms of readability and comprehensibility. In order to show the efficiency of the proposed method, this paper presents the comparison between the proposed methods with the existing MEAD algorithm. The timestamp procedure is also applied on the MEAD algorithm and the results are examined with the proposed method. The results show that the proposed method results in lesser time than the existing MEAD algorithm to execute the summarization process. Moreover, the proposed method results in better precision, recall, and F-score than the existing clustering with lexical chaining approach. PMID:27034971

  1. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    NASA Astrophysics Data System (ADS)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  2. Inherent Structure versus Geometric Metric for State Space Discretization

    PubMed Central

    Liu, Hanzhong; Li, Minghai; Fan, Jue; Huo, Shuanghong

    2016-01-01

    Inherent structure (IS) and geometry-based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the micro cluster level, the IS approach and root-mean-square deviation (RMSD) based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the micro clusters are similar. The discrepancy at the micro cluster level leads to different macro clusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macro cluster level. PMID:26915811

  3. Integrated cluster- and case-based surveillance for detecting stage III zoonotic pathogens: an example of Nipah virus surveillance in Bangladesh.

    PubMed

    Naser, A M; Hossain, M J; Sazzad, H M S; Homaira, N; Gurley, E S; Podder, G; Afroj, S; Banu, S; Rollin, P E; Daszak, P; Ahmed, B-N; Rahman, M; Luby, S P

    2015-07-01

    This paper explores the utility of cluster- and case-based surveillance established in government hospitals in Bangladesh to detect Nipah virus, a stage III zoonotic pathogen. Physicians listed meningo-encephalitis cases in the 10 surveillance hospitals and identified a cluster when ⩾2 cases who lived within 30 min walking distance of one another developed symptoms within 3 weeks of each other. Physicians collected blood samples from the clustered cases. As part of case-based surveillance, blood was collected from all listed meningo-encephalitis cases in three hospitals during the Nipah season (January-March). An investigation team visited clustered cases' communities to collect epidemiological information and blood from the living cases. We tested serum using Nipah-specific IgM ELISA. Up to September 2011, in 5887 listed cases, we identified 62 clusters comprising 176 encephalitis cases. We collected blood from 127 of these cases. In 10 clusters, we identified a total of 62 Nipah cases: 18 laboratory-confirmed and 34 probable. We identified person-to-person transmission of Nipah virus in four clusters. From case-based surveillance, we identified 23 (4%) Nipah cases. Faced with thousands of encephalitis cases, integrated cluster surveillance allows targeted deployment of investigative resources to detect outbreaks by stage III zoonotic pathogens in resource-limited settings.

  4. User-Friendly Interface Developed for a Web-Based Service for SpaceCAL Emulations

    NASA Technical Reports Server (NTRS)

    Liszka, Kathy J.; Holtz, Allen P.

    2004-01-01

    A team at the NASA Glenn Research Center is developing a Space Communications Architecture Laboratory (SpaceCAL) for protocol development activities for coordinated satellite missions. SpaceCAL will provide a multiuser, distributed system to emulate space-based Internet architectures, backbone networks, formation clusters, and constellations. As part of a new effort in 2003, building blocks are being defined for an open distributed system to make the satellite emulation test bed accessible through an Internet connection. The first step in creating a Web-based service to control the emulation remotely is providing a user-friendly interface for encoding the data into a well-formed and complete Extensible Markup Language (XML) document. XML provides coding that allows data to be transferred between dissimilar systems. Scenario specifications include control parameters, network routes, interface bandwidths, delay, and bit error rate. Specifications for all satellite, instruments, and ground stations in a given scenario are also included in the XML document. For the SpaceCAL emulation, the XML document can be created using XForms, a Webbased forms language for data collection. Contrary to older forms technology, the interactive user interface makes the science prevalent, not the data representation. Required versus optional input fields, default values, automatic calculations, data validation, and reuse will help researchers quickly and accurately define missions. XForms can apply any XML schema defined for the test mission to validate data before forwarding it to the emulation facility. New instrument definitions, facilities, and mission types can be added to the existing schema. The first prototype user interface incorporates components for interactive input and form processing. Internet address, data rate, and the location of the facility are implemented with basic form controls with default values provided for convenience and efficiency using basic XForms operations. Because different emulation scenarios will vary widely in their component structure, more complex operations are used to add and delete facilities.

  5. Topology control algorithm for wireless sensor networks based on Link forwarding

    NASA Astrophysics Data System (ADS)

    Pucuo, Cairen; Qi, Ai-qin

    2018-03-01

    The research of topology control could effectively save energy and increase the service life of network based on wireless sensor. In this paper, a arithmetic called LTHC (link transmit hybrid clustering) based on link transmit is proposed. It decreases expenditure of energy by changing the way of cluster-node’s communication. The idea is to establish a link between cluster and SINK node when the cluster is formed, and link-node must be non-cluster. Through the link, cluster sends information to SINK nodes. For the sake of achieving the uniform distribution of energy on the network, prolongate the network survival time, and improve the purpose of communication, the communication will cut down much more expenditure of energy for cluster which away from SINK node. In the two aspects of improving the traffic and network survival time, we find that the LTCH is far superior to the traditional LEACH by experiments.

  6. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    PubMed

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  7. Heterogeneous Tensor Decomposition for Clustering via Manifold Optimization.

    PubMed

    Sun, Yanfeng; Gao, Junbin; Hong, Xia; Mishra, Bamdev; Yin, Baocai

    2016-03-01

    Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

  8. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  9. Residual energy level based clustering routing protocol for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Yuan, Xu; Zhong, Fangming; Chen, Zhikui; Yang, Deli

    2015-12-01

    The wireless sensor networks, which nodes prone to premature death, with unbalanced energy consumption and a short life time, influenced the promotion and application of this technology in internet of things in agriculture. This paper proposes a clustering routing protocol based on the residual energy level (RELCP). RELCP includes three stages: the selection of cluster head, establishment of cluster and data transmission. RELCP considers the remaining energy level and distance to base station, while election of cluster head nodes and data transmitting. Simulation results demonstrate that the protocol can efficiently balance the energy dissipation of all nodes, and prolong the network lifetime.

  10. Multiconstrained gene clustering based on generalized projections

    PubMed Central

    2010-01-01

    Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions. PMID:20356386

  11. Application of cluster technology in location-based service

    NASA Astrophysics Data System (ADS)

    Chen, Jing; Wang, Xiaoman; Gong, Jianya

    2005-10-01

    This paper introduces the principle, algorithmic and realization of the Load Balancing Technology. It also designs a clustered method in the application of Location-Based Service (LBS), and explains its function characteristics and its whole system structure, followed by some experimental comparisons, showing that the Cluster Technology could ensure a LBS's continuous running and the sharing of fault-tolerance and cluster.

  12. Robust MST-Based Clustering Algorithm.

    PubMed

    Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

    2018-06-01

    Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

  13. Zone-Based Routing Protocol for Wireless Sensor Networks

    PubMed Central

    Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran

    2014-01-01

    Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads. PMID:27437455

  14. Zone-Based Routing Protocol for Wireless Sensor Networks.

    PubMed

    Venkateswarlu Kumaramangalam, Muni; Adiyapatham, Kandasamy; Kandasamy, Chandrasekaran

    2014-01-01

    Extensive research happening across the globe witnessed the importance of Wireless Sensor Network in the present day application world. In the recent past, various routing algorithms have been proposed to elevate WSN network lifetime. Clustering mechanism is highly successful in conserving energy resources for network activities and has become promising field for researches. However, the problem of unbalanced energy consumption is still open because the cluster head activities are tightly coupled with role and location of a particular node in the network. Several unequal clustering algorithms are proposed to solve this wireless sensor network multihop hot spot problem. Current unequal clustering mechanisms consider only intra- and intercluster communication cost. Proper organization of wireless sensor network into clusters enables efficient utilization of limited resources and enhances lifetime of deployed sensor nodes. This paper considers a novel network organization scheme, energy-efficient edge-based network partitioning scheme, to organize sensor nodes into clusters of equal size. Also, it proposes a cluster-based routing algorithm, called zone-based routing protocol (ZBRP), for elevating sensor network lifetime. Experimental results show that ZBRP out-performs interims of network lifetime and energy conservation with its uniform energy consumption among the cluster heads.

  15. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

    NASA Astrophysics Data System (ADS)

    Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

    2018-04-01

    Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.

  16. DCE: A Distributed Energy-Efficient Clustering Protocol for Wireless Sensor Network Based on Double-Phase Cluster-Head Election.

    PubMed

    Han, Ruisong; Yang, Wei; Wang, Yipeng; You, Kaiming

    2017-05-01

    Clustering is an effective technique used to reduce energy consumption and extend the lifetime of wireless sensor network (WSN). The characteristic of energy heterogeneity of WSNs should be considered when designing clustering protocols. We propose and evaluate a novel distributed energy-efficient clustering protocol called DCE for heterogeneous wireless sensor networks, based on a Double-phase Cluster-head Election scheme. In DCE, the procedure of cluster head election is divided into two phases. In the first phase, tentative cluster heads are elected with the probabilities which are decided by the relative levels of initial and residual energy. Then, in the second phase, the tentative cluster heads are replaced by their cluster members to form the final set of cluster heads if any member in their cluster has more residual energy. Employing two phases for cluster-head election ensures that the nodes with more energy have a higher chance to be cluster heads. Energy consumption is well-distributed in the proposed protocol, and the simulation results show that DCE achieves longer stability periods than other typical clustering protocols in heterogeneous scenarios.

  17. DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

    PubMed

    Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei

    2018-01-01

    Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. Hedgehog bases for A n cluster polylogarithms and an application to six-point amplitudes

    DOE PAGES

    Parker, Daniel E.; Scherlis, Adam; Spradlin, Marcus; ...

    2015-11-20

    Multi-loop scattering amplitudes in N=4 Yang-Mills theory possess cluster algebra structure. In order to develop a computational framework which exploits this connection, we show how to construct bases of Goncharov polylogarithm functions, at any weight, whose symbol alphabet consists of cluster coordinates on the A n cluster algebra. As a result, using such a basis we present a new expression for the 2-loop 6-particle NMHV amplitude which makes some of its cluster structure manifest.

  19. Cluster-specific small airway modeling for imaging-based CFD analysis of pulmonary air flow and particle deposition in COPD smokers

    NASA Astrophysics Data System (ADS)

    Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

    2017-11-01

    Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.

  20. A cluster-randomized controlled knowledge translation feasibility study in Alberta community pharmacies using the PARiHS framework: study protocol.

    PubMed

    Rosenthal, Meagen M; Tsuyuki, Ross T; Houle, Sherilyn Kd

    2015-01-01

    Despite evidence of benefit for pharmacist involvement in chronic disease management, the provision of these services in community pharmacy has been suboptimal. The Promoting Action on Research Implementation in Health Services (PARiHS) framework suggests that for knowledge translation to be effective, there must be evidence of benefit, a context conducive to implementation, and facilitation to support uptake. We hypothesize that while the evidence and context components of this framework are satisfied, that uptake into practice has been insufficient because of a lack of facilitation. This protocol describes the rationale and methods of a feasibility study to test a facilitated pharmacy practice intervention based on the PARiHS framework, to assist community pharmacists in increasing the number of formal and documented medication management services completed for patients with diabetes, dyslipidemia, and hypertension. A cluster-randomized before-after design will compare ten pharmacies from within a single organization, with the unit of randomization being the pharmacy. Pharmacies will be randomized to facilitated intervention based on the PARiHS framework or usual practice. The Alberta Context Tool will be used to establish the context of practice in each pharmacy. Pharmacies randomized to the intervention will receive task-focused facilitation from an external facilitator, with the goal of developing alternative team processes to allow the greater provision of medication management services for patients with diabetes, hypertension, and dyslipidemia. The primary outcome will be a process evaluation of the needs of community pharmacies to provide more clinical services, the acceptability and uptake of modifications made, and the willingness of pharmacies to participate. Secondary outcomes will include the change in the number of formal and documented medication management services in the aforementioned chronic conditions provided 6 months before, versus after, the intervention between the two groups, and identification of feasible quantitative outcomes for evaluating the effect of the intervention on patient care outcomes. To date, the study has identified and enrolled the ten pharmacies required and initiated the intervention process. This study will be the first to examine the role of facilitation in pharmacy practice, with the goal of scalable and sustainable practice change. Clinicaltrials.gov identifier NCT02191111.

  1. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition

    PubMed Central

    Austin, Elena; Coull, Brent A.; Zanobetti, Antonella; Koutrakis, Petros

    2013-01-01

    Background Heterogeneity in the response to PM2.5 is hypothesized to be related to differences in particle composition across monitoring sites which reflect differences in source types as well as climatic and topographic conditions impacting different geographic locations. Identifying spatial patterns in particle composition is a multivariate problem that requires novel methodologies. Objectives Use cluster analysis methods to identify spatial patterns in PM2.5 composition. Verify that the resulting clusters are distinct and informative. Methods 109 monitoring sites with 75% reported speciation data during the period 2003–2008 were selected. These sites were categorized based on their average PM2.5 composition over the study period using k-means cluster analysis. The obtained clusters were validated and characterized based on their physico-chemical characteristics, geographic locations, emissions profiles, population density and proximity to major emission sources. Results Overall 31 clusters were identified. These include 21 clusters with 2 or more sites which were further grouped into 4 main types using hierarchical clustering. The resulting groupings are chemically meaningful and represent broad differences in emissions. The remaining clusters, encompassing single sites, were characterized based on their particle composition and geographic location. Conclusions The framework presented here provides a novel tool which can be used to identify and further classify sites based on their PM2.5 composition. The solution presented is fairly robust and yielded groupings that were meaningful in the context of air-pollution research. PMID:23850585

  2. Bibliometric mapping and clustering analysis of Iranian papers on reproductive medicine in Scopus database (2010-2014).

    PubMed

    Bazm, Soheila; Kalantar, Seyyed Mehdi; Mirzaei, Masoud

    2016-06-01

    To meet the future challenges in the field of reproductive medicine in Iran, better understanding of published studies is needed. Bibliometric methods and social network analysis have been used to measure the scope and illustrate scientific output of researchers in this field. This study provides insight into the structure of the network of Iranian papers published in the field of reproductive medicine through 2010-2014. In this cross-sectional study, all relevant scientific publications were retrieved from Scopus database and were analyzed according to document type, journal of publication, hot topics, authors and institutions. The results were mapped and clustered by VosViewer software. In total, 3141 papers from Iranian researchers were identified in Scopus database between 2010-2014. The numbers of publications per year have been increased from 461 in 2010 to 749 in 2014. Tehran University of Medical Sciences and "Soleimani M" are occupied the top position based on Productivity indicator. Likewise "Soleimani M" was obtained the first rank among authors according to degree centrality, betweenness centrality and collaboration criteria. In addition, among institutions, Iranian Academic Center for Education, Culture and Research (ACECR) was leader based on degree centrality, betweenness centrality and collaboration indicators. Publications of Iranian researchers in the field of reproductive medicine showed steadily growth during 2010-2014. It seems that in addition to quantity, Iranian authors have to promote quality of articles and collaboration. It will help them to advance their efforts.

  3. Bibliometric mapping and clustering analysis of Iranian papers on reproductive medicine in Scopus database (2010-2014)

    PubMed Central

    Bazm, Soheila; Kalantar, Seyyed Mehdi; Mirzaei, Masoud

    2016-01-01

    Background: To meet the future challenges in the field of reproductive medicine in Iran, better understanding of published studies is needed. Bibliometric methods and social network analysis have been used to measure the scope and illustrate scientific output of researchers in this field. Objective: This study provides insight into the structure of the network of Iranian papers published in the field of reproductive medicine through 2010-2014. Materials and Methods: In this cross-sectional study, all relevant scientific publications were retrieved from Scopus database and were analyzed according to document type, journal of publication, hot topics, authors and institutions. The results were mapped and clustered by VosViewer software. Results: In total, 3141 papers from Iranian researchers were identified in Scopus database between 2010-2014. The numbers of publications per year have been increased from 461 in 2010 to 749 in 2014. Tehran University of Medical Sciences and "Soleimani M" are occupied the top position based on Productivity indicator. Likewise "Soleimani M" was obtained the first rank among authors according to degree centrality, betweenness centrality and collaboration criteria. In addition, among institutions, Iranian Academic Center for Education, Culture and Research (ACECR) was leader based on degree centrality, betweenness centrality and collaboration indicators. Conclusion: Publications of Iranian researchers in the field of reproductive medicine showed steadily growth during 2010-2014. It seems that in addition to quantity, Iranian authors have to promote quality of articles and collaboration. It will help them to advance their efforts. PMID:27525320

  4. Socioeconomic and disability consequences of injuries in the Sudan: a community-based survey in Khartoum State

    PubMed Central

    El Tayeb, Sally; Abdalla, Safa; Heuch, Ivar; Van den Bergh, Graziella

    2015-01-01

    Background Fatal and non-fatal injuries are of increasing public health concern globally, particularly in low and middle-income countries. Injuries sustained by individuals also impact society, creating a loss of productivity with serious economic consequences. In Sudan, there is no documentation of the burden of injuries on individuals and society. Methods A community-based survey was performed in Khartoum State, using a stratified two-stage cluster sampling technique. Households were selected in each cluster by systematic random sampling. Face-to-face interviews during October and November 2010 were conducted. Fatal injuries occurring during 5 years preceding the survey and non-fatal injuries occurring during 12 months preceding interviews were included. Results The total number of individuals included was 5661, residing in 973 households. There were 28 deaths due to injuries out of a total of 129 reported deaths over 5 years. A total of 441 cases of non-fatal injuries occurred during the 12 months preceding the survey. The number of disability days differed significantly between mechanisms of injury. Road traffic crashes and falls caused the longest duration of disability. Men had a higher probability than women of losing a job due to an injury. Conclusions This study demonstrates the importance of prioritising prevention of road traffic crashes and falls. The loss of productivity in lower socioeconomic strata highlights the need for social security policies. Further research is needed for estimating the economic cost of injuries in Sudan. PMID:24225061

  5. A proximity-based graph clustering method for the identification and application of transcription factor clusters.

    PubMed

    Spadafore, Maxwell; Najarian, Kayvan; Boyle, Alan P

    2017-11-29

    Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.

  6. Toward the 21st Century: Preparing Proactive Visionary Transformational Leaders for Building Learning Communities. Human Resource Development. South Florida Cluster.

    ERIC Educational Resources Information Center

    Groff, Warren H.

    The first part of this document describes Nova University's doctoral program in Vocational, Technical, and Occupational (VTO) Education, developed in response to the need to create high performance learners and leaders for building learning communities. It discusses how a curriculum change in 1990 resulted in the following: conversion of the…

  7. Joining the Tots: Visual Research Tools to Connect Families and Community in Early Childhood Education

    ERIC Educational Resources Information Center

    Duncan, Judith; One, Sarah Te

    2014-01-01

    Over a two-year teacher-researcher project in New Zealand we used a mosaic of research methods (Clark, 2010) to capture the perspectives of staff, parents and children. As a team of teachers and academic researchers, we recorded and documented reconceptualised pedagogical practices that included active adult participation in a cluster of early…

  8. Mississippi Curriculum Framework for Emergency Medical Technology--Basic (Program CIP: 51.0904). Emergency Medical Technology--Paramedic (Program CIP: 51.0904). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the emergency medical technology (EMT) programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline…

  9. Mississippi Curriculum Framework for Postsecondary Child Development Technology Programs (CIP: 20.0201--Child Care & Guidance Workers & Mgr). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the child development technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies,…

  10. Mississippi Curriculum Framework for Fashion Marketing Technology (Program CIP: 08.0101--Apparel and Accessories Mkt. Op., Gen.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the fashion marketing technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies,…

  11. Using hierarchical cluster models to systematically identify groups of jobs with similar occupational questionnaire response patterns to assist rule-based expert exposure assessment in population-based studies.

    PubMed

    Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai

    2015-05-01

    Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.

  12. Cluster analysis of dynamic contrast enhanced MRI reveals tumor subregions related to locoregional relapse for cervical cancer patients.

    PubMed

    Torheim, Turid; Groendahl, Aurora R; Andersen, Erlend K F; Lyng, Heidi; Malinen, Eirik; Kvaal, Knut; Futsaether, Cecilia M

    2016-11-01

    Solid tumors are known to be spatially heterogeneous. Detection of treatment-resistant tumor regions can improve clinical outcome, by enabling implementation of strategies targeting such regions. In this study, K-means clustering was used to group voxels in dynamic contrast enhanced magnetic resonance images (DCE-MRI) of cervical cancers. The aim was to identify clusters reflecting treatment resistance that could be used for targeted radiotherapy with a dose-painting approach. Eighty-one patients with locally advanced cervical cancer underwent DCE-MRI prior to chemoradiotherapy. The resulting image time series were fitted to two pharmacokinetic models, the Tofts model (yielding parameters K trans and ν e ) and the Brix model (A Brix , k ep and k el ). K-means clustering was used to group similar voxels based on either the pharmacokinetic parameter maps or the relative signal increase (RSI) time series. The associations between voxel clusters and treatment outcome (measured as locoregional control) were evaluated using the volume fraction or the spatial distribution of each cluster. One voxel cluster based on the RSI time series was significantly related to locoregional control (adjusted p-value 0.048). This cluster consisted of low-enhancing voxels. We found that tumors with poor prognosis had this RSI-based cluster gathered into few patches, making this cluster a potential candidate for targeted radiotherapy. None of the voxels clusters based on Tofts or Brix parameter maps were significantly related to treatment outcome. We identified one group of tumor voxels significantly associated with locoregional relapse that could potentially be used for dose painting. This tumor voxel cluster was identified using the raw MRI time series rather than the pharmacokinetic maps.

  13. Controlled and Uncontrolled Subject Descriptions in the CF Database: A Comparison of Optimal Cluster-Based Retrieval Results.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1993-01-01

    Describes a study conducted on the cystic fibrosis (CF) database, a subset of MEDLINE, that investigated clustering structure and the effectiveness of cluster-based retrieval as a function of the exhaustivity of the uncontrolled subject descriptions. Results are compared to calculations for controlled descriptions based on Medical Subject Headings…

  14. A Constraint-Based Approach to Acquisition of Word-Final Consonant Clusters in Turkish Children

    ERIC Educational Resources Information Center

    Gokgoz-Kurt, Burcu

    2017-01-01

    The current study provides a constraint-based analysis of L1 word-final consonant cluster acquisition in Turkish child language, based on the data originally presented by Topbas and Kopkalli-Yavuz (2008). The present analysis was done using [?]+obstruent consonant cluster acquisition. A comparison of Gradual Learning Algorithm (GLA) under…

  15. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. SH2 domain containing leukocyte phosphoprotein of 76-kDa (SLP-76) feedback regulation of ZAP-70 microclustering.

    PubMed

    Liu, Hebin; Purbhoo, Marco A; Davis, Daniel M; Rudd, Christopher E

    2010-06-01

    T cell receptor (TCR) signaling involves CD4/CD8-p56lck recruitment of ZAP-70 to the TCR receptor, ZAP-70 phosphorylation of LAT that is followed by LAT recruitment of the GADS-SLP-76 complex. Back regulation of ZAP-70 by SLP-76 has not been documented. In this paper, we show that anti-CD3 induced ZAP-70 cluster formation is significantly reduced in the absence of SLP-76 (i.e., J14 cells) and in the presence of a mutant of SLP-76 (4KE) in Jurkat and primary T cells. Both the number of cells with clusters and the number of clusters per cell were reduced. This effect was not mediated by SLP-76 SH2 domain binding to ZAP-70 because SLP-76 failed to precipitate ZAP-70 and an inactivating SH2 domain mutation (i.e., R448L) on SLP-76 4KE did not reverse the inhibition of ZAP-70 clustering. Mutation of R448 on WT SLP-76 still supported ZAP-70 clustering. Intriguingly, by contrast, LAT clustering occurred normally in the absence of SLP-76, or the presence of 4KE SLP-76 indicating that this transmembrane adaptor can operate independently of ZAP-70-GADS-SLP-76. Our findings reconfigure the TCR signaling pathway by showing SLP-76 back-regulation of ZAP-70, an event that could ensure that signaling components are in balance for optimal T cell activation.

  17. Further Automate Planned Cluster Maintenance to Minimize System Downtime during Maintenance Windows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Springmeyer, R.

    This report documents the integration and testing of the automated update process of compute clusters in LC to minimize impact to user productivity. Description: A set of scripts will be written and deployed to further standardize cluster maintenance activities and minimize downtime during planned maintenance windows. Completion Criteria: When the scripts have been deployed and used during planned maintenance windows and a timing comparison is completed between the existing process and the new more automated process, this milestone is complete. This milestone was completed on Aug 23, 2016 on the new CTS1 cluster called Jade when a request to upgrademore » the version of TOSS 3 was initiated while SWL jobs and normal user jobs were running. Jobs that were running when the update to the system began continued to run to completion. New jobs on the cluster started on the new release of TOSS 3. No system administrator action was required. Current update procedures in TOSS 2 begin by killing all users jobs. Then all diskfull nodes are updated, which can take a few hours. Only after the updates are applied are all nodes are rebooted, and then finally put back into service. A system administrator is required for all steps. In terms of human time spent during a cluster OS update, the TOSS 3 automated procedure on Jade took 0 FTE hours. Doing the same update without the Toss Update Tool would have required 4 FTE hours.« less

  18. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

  19. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

    PubMed Central

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  20. Assessment and self-assessment of the pharmacists' competencies using the global competency framework (GbCF) in Serbia.

    PubMed

    Stojkov, Svetlana; Tadić, Ivana; Crnjanski, Tatjana; Krajnović, Dušanka

    2016-09-01

    Pharmacists' competence represents a dynamic framework of knowledge, skills and abilities to carry out tasks, and it reflects on improving the quality of life and on patients’ health. One of the documents for the Evaluation and Competency Development of Pharmacists is the Global Competency Framework (GbCF). The aim of this study was to implement the GBCF document into Serbian pharmacies, to perform assessment and self assessment of the competencies. The assessment and self-assessment of pharmacists’ competencies were performed during the period 2012−13 year in eight community pharmacy chains, in seven cities in Serbia. For assessment and self-assessment of pharmacists competencies the GbCF model was applied, which was adjusted to pharmaceutical practice and legislation in Serbia. External assessment was conducted by teams of pharmacists using the structured observation of the work of pharmacists during regular working hours. Evaluated pharmacists filled out the questionnaire about demographic indicators about the pharmacist and the pharmacy where they work. A total of 123 pharmacists were evaluated. Pharmacists’ Professional Competency Cluster (KK1) had the lowest score (average value 2.98), while the cluster Management and Organizational Competency (KK2) had the highest score (average value 3.15). The competence Recognition of the Diagnosis and Patient Counseling (K8), which belonged to the cluster KK1, had the lowest score (average value for assessment and self-assessment were 2.09, and 2.34, respectively) among the all evaluated competencies. GbCF might be considered as an instrument for the competencies' evaluation/selfevaluation and their improvement, accordingly.

Top