Sample records for clustering tests based

  1. The Wilcoxon signed rank test for paired comparisons of clustered data.

    PubMed

    Rosner, Bernard; Glynn, Robert J; Lee, Mei-Ling T

    2006-03-01

    The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

  2. A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative

    PubMed Central

    Datta, Somnath; Nevalainen, Jaakko; Oja, Hannu

    2012-01-01

    SUMMARY Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence. The purpose of this paper is to test the symmetry of a marginal distribution under clustered data. However, unlike most other papers in the area, we consider the possibility that the cluster size is a random variable whose distribution is dependent on the distribution of the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g., not controlled by the experimenter or statistician) and in which the size of the cluster may carry information about the distribution of data values within a cluster. Under the scenario of an informative cluster size, attempts to use some form of variance adjusted sign or signed rank tests would fail since they would not maintain the correct size under the distribution of marginal symmetry. To overcome this difficulty Datta and Satten (2008; Biometrics, 64, 501–507) proposed a Wilcoxon type signed rank test based on the principle of within cluster resampling. In this paper we study this problem in more generality by introducing a class of valid tests employing a general score function. Asymptotic null distribution of these tests is obtained. A simulation study shows that a more general choice of the score function can sometimes result in greater power than the Datta and Satten test; furthermore, this development offers the user a wider choice. We illustrate our tests using a real data example on spinal cord injury patients. PMID:23074359

  3. A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative.

    PubMed

    Datta, Somnath; Nevalainen, Jaakko; Oja, Hannu

    2012-09-01

    Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence.The purpose of this paper is to test the symmetry of a marginal distribution under clustered data. However, unlike most other papers in the area, we consider the possibility that the cluster size is a random variable whose distribution is dependent on the distribution of the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g., not controlled by the experimenter or statistician) and in which the size of the cluster may carry information about the distribution of data values within a cluster.Under the scenario of an informative cluster size, attempts to use some form of variance adjusted sign or signed rank tests would fail since they would not maintain the correct size under the distribution of marginal symmetry. To overcome this difficulty Datta and Satten (2008; Biometrics, 64, 501-507) proposed a Wilcoxon type signed rank test based on the principle of within cluster resampling. In this paper we study this problem in more generality by introducing a class of valid tests employing a general score function. Asymptotic null distribution of these tests is obtained. A simulation study shows that a more general choice of the score function can sometimes result in greater power than the Datta and Satten test; furthermore, this development offers the user a wider choice. We illustrate our tests using a real data example on spinal cord injury patients.

  4. Coordinate-Based Clustering Method for Indoor Fingerprinting Localization in Dense Cluttered Environments.

    PubMed

    Liu, Wen; Fu, Xiao; Deng, Zhongliang

    2016-12-02

    Indoor positioning technologies has boomed recently because of the growing commercial interest in indoor location-based service (ILBS). Due to the absence of satellite signal in Global Navigation Satellite System (GNSS), various technologies have been proposed for indoor applications. Among them, Wi-Fi fingerprinting has been attracting much interest from researchers because of its pervasive deployment, flexibility and robustness to dense cluttered indoor environments. One challenge, however, is the deployment of Access Points (AP), which would bring a significant influence on the system positioning accuracy. This paper concentrates on WLAN based fingerprinting indoor location by analyzing the AP deployment influence, and studying the advantages of coordinate-based clustering compared to traditional RSS-based clustering. A coordinate-based clustering method for indoor fingerprinting location, named Smallest-Enclosing-Circle-based (SEC), is then proposed aiming at reducing the positioning error lying in the AP deployment and improving robustness to dense cluttered environments. All measurements are conducted in indoor public areas, such as the National Center For the Performing Arts (as Test-bed 1) and the XiDan Joy City (Floors 1 and 2, as Test-bed 2), and results show that SEC clustering algorithm can improve system positioning accuracy by about 32.7% for Test-bed 1, 71.7% for Test-bed 2 Floor 1 and 73.7% for Test-bed 2 Floor 2 compared with traditional RSS-based clustering algorithms such as K-means.

  5. Coordinate-Based Clustering Method for Indoor Fingerprinting Localization in Dense Cluttered Environments

    PubMed Central

    Liu, Wen; Fu, Xiao; Deng, Zhongliang

    2016-01-01

    Indoor positioning technologies has boomed recently because of the growing commercial interest in indoor location-based service (ILBS). Due to the absence of satellite signal in Global Navigation Satellite System (GNSS), various technologies have been proposed for indoor applications. Among them, Wi-Fi fingerprinting has been attracting much interest from researchers because of its pervasive deployment, flexibility and robustness to dense cluttered indoor environments. One challenge, however, is the deployment of Access Points (AP), which would bring a significant influence on the system positioning accuracy. This paper concentrates on WLAN based fingerprinting indoor location by analyzing the AP deployment influence, and studying the advantages of coordinate-based clustering compared to traditional RSS-based clustering. A coordinate-based clustering method for indoor fingerprinting location, named Smallest-Enclosing-Circle-based (SEC), is then proposed aiming at reducing the positioning error lying in the AP deployment and improving robustness to dense cluttered environments. All measurements are conducted in indoor public areas, such as the National Center For the Performing Arts (as Test-bed 1) and the XiDan Joy City (Floors 1 and 2, as Test-bed 2), and results show that SEC clustering algorithm can improve system positioning accuracy by about 32.7% for Test-bed 1, 71.7% for Test-bed 2 Floor 1 and 73.7% for Test-bed 2 Floor 2 compared with traditional RSS-based clustering algorithms such as K-means. PMID:27918454

  6. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    PubMed

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  7. Cluster Randomized Test-Negative Design (CR-TND) Trials: A Novel and Efficient Method to Assess the Efficacy of Community Level Dengue Interventions.

    PubMed

    Anders, Katherine L; Cutcher, Zoe; Kleinschmidt, Immo; Donnelly, Christl A; Ferguson, Neil M; Indriani, Citra; O'Neill, Scott L; Jewell, Nicholas P; Simmons, Cameron P

    2018-05-07

    Cluster randomized trials are the gold standard for assessing efficacy of community-level interventions, such as vector control strategies against dengue. We describe a novel cluster randomized trial methodology with a test-negative design, which offers advantages over traditional approaches. It utilizes outcome-based sampling of patients presenting with a syndrome consistent with the disease of interest, who are subsequently classified as test-positive cases or test-negative controls on the basis of diagnostic testing. We use simulations of a cluster trial to demonstrate validity of efficacy estimates under the test-negative approach. This demonstrates that, provided study arms are balanced for both test-negative and test-positive illness at baseline and that other test-negative design assumptions are met, the efficacy estimates closely match true efficacy. We also briefly discuss analytical considerations for an odds ratio-based effect estimate arising from clustered data, and outline potential approaches to analysis. We conclude that application of the test-negative design to certain cluster randomized trials could increase their efficiency and ease of implementation.

  8. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  9. Inference from clustering with application to gene-expression microarrays.

    PubMed

    Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

    2002-01-01

    There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

  10. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  11. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    PubMed

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  13. Clustering by neurocognition for fine-mapping of the schizophrenia susceptibility loci on chromosome 6p

    PubMed Central

    Lin, Sheng-Hsiang; Liu, Chih-Min; Liu, Yu-Li; Fann, Cathy Shen-Jang; Hsiao, Po-Chang; Wu, Jer-Yuarn; Hung, Shuen-Iu; Chen, Chun-Houh; Wu, Han-Ming; Jou, Yuh-Shan; Liu, Shi K.; Hwang, Tzung J.; Hsieh, Ming H.; Chang, Chien-Ching; Yang, Wei-Chih; Lin, Jin-Jia; Chou, Frank Huang-Chih; Faraone, Stephen V.; Tsuang, Ming T.; Hwu, Hai-Gwo; Chen, Wei J.

    2009-01-01

    Chromosome 6p is one of the most commonly implicated regions in the genome-wide linkage scans of schizophrenia, whereas further association studies for markers in this region were inconsistent likely due to heterogeneity. This study aimed to identify more homogeneous subgroups of families for fine mapping on regions around markers D6S296 and D6S309 (both in 6p24.3) as well as D6S274 (in 6p22.3) by means of similarity in neurocognitive functioning. A total of 160 families of patients with schizophrenia comprising at least two affected siblings who had data for 8 neurocognitive test variables of the Continuous Performance Test (CPT) and the Wisconsin Card Sorting Test (WCST) were subjected to cluster analysis with data visualization using the test scores of both affected siblings. Family clusters derived were then used separately in family-based association tests for 64 single nucleotide polymorphisms covering the region of 6p24.3 and 6p22.3. Three clusters were derived from the family-based clustering, with deficit cluster 1 representing deficit on the CPT, deficit cluster 2 representing deficit on both the CPT and the WCST, and a third cluster of non-deficit. After adjustment using false discovery rate for multiple testing, SNP rs13873 and haplotype rs1225934-rs13873 on BMP6-TXNDC5 genes were significantly associated with schizophrenia for the deficit cluster 1 but not for the deficit cluster 2 or non-deficit cluster. Our results provide further evidence that the BMP6-TXNDC5 locus on 6p24.3 may play a role in the selective impairments on sustained attention of schizophrenia. PMID:19694819

  14. Testing prediction methods: Earthquake clustering versus the Poisson model

    USGS Publications Warehouse

    Michael, A.J.

    1997-01-01

    Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.

  15. Testing and evaluation of sign support with cluster attachments.

    DOT National Transportation Integrated Search

    1990-04-01

    Two full-scale crash tests were conducted on the Louisiana two-post, inclined, slip-base sign assembly with cluster sign attachment. These two tests were performed and evaluated in accordance with guidelines under NCHRP Report 230 and standards estab...

  16. A rank-sum test for clustered data when the number of subjects in a group within a cluster is informative.

    PubMed

    Dutta, Sandipan; Datta, Somnath

    2016-06-01

    The Wilcoxon rank-sum test is a popular nonparametric test for comparing two independent populations (groups). In recent years, there have been renewed attempts in extending the Wilcoxon rank sum test for clustered data, one of which (Datta and Satten, 2005, Journal of the American Statistical Association 100, 908-915) addresses the issue of informative cluster size, i.e., when the outcomes and the cluster size are correlated. We are faced with a situation where the group specific marginal distribution in a cluster depends on the number of observations in that group (i.e., the intra-cluster group size). We develop a novel extension of the rank-sum test for handling this situation. We compare the performance of our test with the Datta-Satten test, as well as the naive Wilcoxon rank sum test. Using a naturally occurring simulation model of informative intra-cluster group size, we show that only our test maintains the correct size. We also compare our test with a classical signed rank test based on averages of the outcome values in each group paired by the cluster membership. While this test maintains the size, it has lower power than our test. Extensions to multiple group comparisons and the case of clusters not having samples from all groups are also discussed. We apply our test to determine whether there are differences in the attachment loss between the upper and lower teeth and between mesial and buccal sites of periodontal patients. © 2015, The International Biometric Society.

  17. Scalability of a Low-Cost Multi-Teraflop Linux Cluster for High-End Classical Atomistic and Quantum Mechanical Simulations

    NASA Technical Reports Server (NTRS)

    Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash

    2003-01-01

    Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.

  18. Using experimental data to test an n -body dynamical model coupled with an energy-based clusterization algorithm at low incident energies

    NASA Astrophysics Data System (ADS)

    Kumar, Rohit; Puri, Rajeev K.

    2018-03-01

    Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.

  19. Structure-related clustering of gene expression fingerprints of thp-1 cells exposed to smaller polycyclic aromatic hydrocarbons.

    PubMed

    Wan, B; Yarbrough, J W; Schultz, T W

    2008-01-01

    This study was undertaken to test the hypothesis that structurally similar PAHs induce similar gene expression profiles. THP-1 cells were exposed to a series of 12 selected PAHs at 50 microM for 24 hours and gene expressions profiles were analyzed using both unsupervised and supervised methods. Clustering analysis of gene expression profiles revealed that the 12 tested chemicals were grouped into five clusters. Within each cluster, the gene expression profiles are more similar to each other than to the ones outside the cluster. One-methylanthracene and 1-methylfluorene were found to have the most similar profiles; dibenzothiophene and dibenzofuran were found to share common profiles with fluorine. As expression pattern comparisons were expanded, similarity in genomic fingerprint dropped off dramatically. Prediction analysis of microarrays (PAM) based on the clustering pattern generated 49 predictor genes that can be used for sample discrimination. Moreover, a significant analysis of Microarrays (SAM) identified 598 genes being modulated by tested chemicals with a variety of biological processes, such as cell cycle, metabolism, and protein binding and KEGG pathways being significantly (p < 0.05) affected. It is feasible to distinguish structurally different PAHs based on their genomic fingerprints, which are mechanism based.

  20. Automated modal parameter estimation using correlation analysis and bootstrap sampling

    NASA Astrophysics Data System (ADS)

    Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

    2018-02-01

    The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.

  1. Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

    PubMed

    Lukashin, A V; Fuchs, R

    2001-05-01

    Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.

  2. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  3. Novel layered clustering-based approach for generating ensemble of classifiers.

    PubMed

    Rahman, Ashfaqur; Verma, Brijesh

    2011-05-01

    This paper introduces a novel concept for creating an ensemble of classifiers. The concept is based on generating an ensemble of classifiers through clustering of data at multiple layers. The ensemble classifier model generates a set of alternative clustering of a dataset at different layers by randomly initializing the clustering parameters and trains a set of base classifiers on the patterns at different clusters in different layers. A test pattern is classified by first finding the appropriate cluster at each layer and then using the corresponding base classifier. The decisions obtained at different layers are fused into a final verdict using majority voting. As the base classifiers are trained on overlapping patterns at different layers, the proposed approach achieves diversity among the individual classifiers. Identification of difficult-to-classify patterns through clustering as well as achievement of diversity through layering leads to better classification results as evidenced from the experimental results.

  4. A Cluster Randomized Controlled Trial Testing the Effectiveness of Houvast: A Strengths-Based Intervention for Homeless Young Adults

    ERIC Educational Resources Information Center

    Krabbenborg, Manon A. M.; Boersma, Sandra N.; van der Veld, William M.; van Hulst, Bente; Vollebergh, Wilma A. M.; Wolf, Judith R. L. M.

    2017-01-01

    Objective: To test the effectiveness of Houvast: a strengths-based intervention for homeless young adults. Method: A cluster randomized controlled trial was conducted with 10 Dutch shelter facilities randomly allocated to an intervention and a control group. Homeless young adults were interviewed when entering the facility and when care ended.…

  5. Cluster analysis of novel isometric strength measures produces a valid and evidence-based classification structure for wheelchair track racing.

    PubMed

    Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M

    2017-11-25

    The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. Membership determination of open clusters based on a spectral clustering method

    NASA Astrophysics Data System (ADS)

    Gao, Xin-Hua

    2018-06-01

    We present a spectral clustering (SC) method aimed at segregating reliable members of open clusters in multi-dimensional space. The SC method is a non-parametric clustering technique that performs cluster division using eigenvectors of the similarity matrix; no prior knowledge of the clusters is required. This method is more flexible in dealing with multi-dimensional data compared to other methods of membership determination. We use this method to segregate the cluster members of five open clusters (Hyades, Coma Ber, Pleiades, Praesepe, and NGC 188) in five-dimensional space; fairly clean cluster members are obtained. We find that the SC method can capture a small number of cluster members (weak signal) from a large number of field stars (heavy noise). Based on these cluster members, we compute the mean proper motions and distances for the Hyades, Coma Ber, Pleiades, and Praesepe clusters, and our results are in general quite consistent with the results derived by other authors. The test results indicate that the SC method is highly suitable for segregating cluster members of open clusters based on high-precision multi-dimensional astrometric data such as Gaia data.

  7. The Effectiveness of Educational Interventions to Enhance the Adoption of Fee-Based Arsenic Testing in Bangladesh: A Cluster Randomized Controlled Trial

    PubMed Central

    George, Christine Marie; Inauen, Jennifer; Rahman, Sheikh Masudur; Zheng, Yan

    2013-01-01

    Arsenic (As) testing could help 22 million people, using drinking water sources that exceed the Bangladesh As standard, to identify safe sources. A cluster randomized controlled trial was conducted to evaluate the effectiveness of household education and local media in the increasing demand for fee-based As testing. Randomly selected households (N = 452) were divided into three interventions implemented by community workers: 1) fee-based As testing with household education (HE); 2) fee-based As testing with household education and a local media campaign (HELM); and 3) fee-based As testing alone (Control). The fee for the As test was US$ 0.28, higher than the cost of the test (US$ 0.16). Of households with untested wells, 93% in both intervention groups HE and HELM purchased an As test, whereas only 53% in the control group. In conclusion, fee-based As testing with household education is effective in the increasing demand for As testing in rural Bangladesh. PMID:23716409

  8. The effectiveness of educational interventions to enhance the adoption of fee-based arsenic testing in Bangladesh: a cluster randomized controlled trial.

    PubMed

    George, Christine Marie; Inauen, Jennifer; Rahman, Sheikh Masudur; Zheng, Yan

    2013-07-01

    Arsenic (As) testing could help 22 million people, using drinking water sources that exceed the Bangladesh As standard, to identify safe sources. A cluster randomized controlled trial was conducted to evaluate the effectiveness of household education and local media in the increasing demand for fee-based As testing. Randomly selected households (N = 452) were divided into three interventions implemented by community workers: 1) fee-based As testing with household education (HE); 2) fee-based As testing with household education and a local media campaign (HELM); and 3) fee-based As testing alone (Control). The fee for the As test was US$ 0.28, higher than the cost of the test (US$ 0.16). Of households with untested wells, 93% in both intervention groups HE and HELM purchased an As test, whereas only 53% in the control group. In conclusion, fee-based As testing with household education is effective in the increasing demand for As testing in rural Bangladesh.

  9. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  10. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    PubMed

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    Intersection accidents are frequent and harmful. The accident types 'straight crossing path' (SCP), 'left turn across path - oncoming direction' (LTAP/OD), and 'left-turn across path - lateral direction' (LTAP/LD) represent around 95% of all intersection accidents and one-third of all police-reported car-to-car accidents in Germany. The European New Car Assessment Program (Euro NCAP) have announced that intersection scenarios will be included in their rating from 2020; however, how these scenarios are to be tested has not been defined. This study investigates whether clustering methods can be used to identify a small number of test scenarios sufficiently representative of the accident dataset to evaluate Intersection Automated Emergency Braking (AEB). Data from the German In-Depth Accident Study (GIDAS) and the GIDAS-based Pre-Crash Matrix (PCM) from 1999 to 2016, containing 784 SCP and 453 LTAP/OD accidents, were analyzed with principal component methods to identify variables that account for the relevant total variances of the sample. Three different methods for data clustering were applied to each of the accident types, two similarity-based approaches, namely Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM), and the probability-based Latent Class Clustering (LCC). The optimum number of clusters was derived for HC and PAM with the silhouette method. The PAM algorithm was both initiated with random start medoid selection and medoids from HC. For LCC, the Bayesian Information Criterion (BIC) was used to determine the optimal number of clusters. Test scenarios were defined from optimal cluster medoids weighted by their real-life representation in GIDAS. The set of variables for clustering was further varied to investigate the influence of variable type and character. We quantified how accurately each cluster variation represents real-life AEB performance using pre-crash simulations with PCM data and a generic algorithm for AEB intervention. The usage of different sets of clustering variables resulted in substantially different numbers of clusters. The stability of the resulting clusters increased with prioritization of categorical over continuous variables. For each different set of cluster variables, a strong in-cluster variance of avoided versus non-avoided accidents for the specified Intersection AEB was present. The medoids did not predict the most common Intersection AEB behavior in each cluster. Despite thorough analysis using various cluster methods and variable sets, it was impossible to reduce the diversity of intersection accidents into a set of test scenarios without compromising the ability to predict real-life performance of Intersection AEB. Although this does not imply that other methods cannot succeed, it was observed that small changes in the definition of a scenario resulted in a different avoidance outcome. Therefore, we suggest using limited physical testing to validate more extensive virtual simulations to evaluate vehicle safety. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Identification of chronic rhinosinusitis phenotypes using cluster analysis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

    2015-05-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.

  12. Definition of run-off-road crash clusters-For safety benefit estimation and driver assistance development.

    PubMed

    Nilsson, Daniel; Lindman, Magdalena; Victor, Trent; Dozza, Marco

    2018-04-01

    Single-vehicle run-off-road crashes are a major traffic safety concern, as they are associated with a high proportion of fatal outcomes. In addressing run-off-road crashes, the development and evaluation of advanced driver assistance systems requires test scenarios that are representative of the variability found in real-world crashes. We apply hierarchical agglomerative cluster analysis to define similarities in a set of crash data variables, these clusters can then be used as the basis in test scenario development. Out of 13 clusters, nine test scenarios are derived, corresponding to crashes characterised by: drivers drifting off the road in daytime and night-time, high speed departures, high-angle departures on narrow roads, highways, snowy roads, loss-of-control on wet roadways, sharp curves, and high speeds on roads with severe road surface conditions. In addition, each cluster was analysed with respect to crash variables related to the crash cause and reason for the unintended lane departure. The study shows that cluster analysis of representative data provides a statistically based method to identify relevant properties for run-off-road test scenarios. This was done to support development of vehicle-based run-off-road countermeasures and driver behaviour models used in virtual testing. Future studies should use driver behaviour from naturalistic driving data to further define how test-scenarios and behavioural causation mechanisms should be included. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Personality based clusters as predictors of aviator attitudes and performance

    NASA Technical Reports Server (NTRS)

    Gregorich, Steve; Helmreich, Robert L.; Wilhelm, John A.; Chidester, Thomas

    1989-01-01

    The feasibility of identification of personality-based population clusters was investigated along with the relationships of these subpopulations to relevant attitude and performance measures. The results of instrumental and expressive personality tests, using the Personal Characteristics Inventory (PCI) test battery and the Cockpit Management Attitudes Questionnaire, suggest that theoretically meaningful subpopulations exist among aviators, and that these groupings are useful in understanding of personality factors acting as moderator variables in the determination of aviator attitudes and performance. Out of the three clusters most easily described in terms of their relative elevations on the PCI subscales ('the right stuff', the 'wrong stuff', and the 'no stuff'), the members of the right stuff cluster tended to have more desirable patterns of responses along relevant attitudinal dimensions.

  14. Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

    PubMed

    Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

    2011-04-01

    The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.

  15. Density-based clustering analyses to identify heterogeneous cellular sub-populations

    NASA Astrophysics Data System (ADS)

    Heaster, Tiffany M.; Walsh, Alex J.; Landman, Bennett A.; Skala, Melissa C.

    2017-02-01

    Autofluorescence microscopy of NAD(P)H and FAD provides functional metabolic measurements at the single-cell level. Here, density-based clustering algorithms were applied to metabolic autofluorescence measurements to identify cell-level heterogeneity in tumor cell cultures. The performance of the density-based clustering algorithm, DENCLUE, was tested in samples with known heterogeneity (co-cultures of breast carcinoma lines). DENCLUE was found to better represent the distribution of cell clusters compared to Gaussian mixture modeling. Overall, DENCLUE is a promising approach to quantify cell-level heterogeneity, and could be used to understand single cell population dynamics in cancer progression and treatment.

  16. Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

    PubMed Central

    Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

    2003-01-01

    Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

  17. [Perception of odor quality by Free Image-Association Test].

    PubMed

    Ueno, Y

    1992-10-01

    A method was devised for evaluating odor quality. Subjects were requested to freely describe the images elicited by smelling odors. This test was named the "Free Image-Association Test (FIT)". The test was applied for 20 flavors of various foods, five odors from the standards of T&T olfactometer (Japanese standard olfactory test), butter of yak milk, and incense from Lamaism temples. The words for expressing imagery were analyzed by multidimensional scaling and cluster analysis. Seven clusters of odors were obtained. The feature of these clusters were quite similar to that of primary odors which have been suggested by previous studies. However, the clustering of odors can not be explained on the basis of the primary-odor theory, but the information processing theory originally proposed by Miller (1956). These results support the usefulness of the Free Image-Association Test for investigating odor perception based on the images associated with odors.

  18. Discrimination of multilocus sequence typing-based Campylobacter jejuni subgroups by MALDI-TOF mass spectrometry.

    PubMed

    Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver

    2013-11-07

    Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.

  19. Physics of Galaxy Clusters and How it Affects Cosmological Tests

    NASA Technical Reports Server (NTRS)

    Vikhlinin, Alexey; Oliversen, Ronald J. (Technical Monitor)

    2002-01-01

    We have worked on the analysis of the Chandra observations of the nearby and distant clusters of galaxies, and on the expansion of the sample of distant X-ray clusters based on the archival ROSAT PSPC data. Some of the scientific results are discussed.

  20. A Variable-Selection Heuristic for K-Means Clustering.

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Cradit, J. Dennis

    2001-01-01

    Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)

  1. Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate

    NASA Astrophysics Data System (ADS)

    Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration

    2000-12-01

    We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.

  2. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  3. A similarity based agglomerative clustering algorithm in networks

    NASA Astrophysics Data System (ADS)

    Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

    2018-04-01

    The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.

  4. Pearson's chi-square test and rank correlation inferences for clustered data.

    PubMed

    Shih, Joanna H; Fay, Michael P

    2017-09-01

    Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  5. Local bladder cancer clusters in southeastern Michigan accounting for risk factors, covariates and residential mobility.

    PubMed

    Jacquez, Geoffrey M; Shi, Chen; Meliker, Jaymie R

    2015-01-01

    In case control studies disease risk not explained by the significant risk factors is the unexplained risk. Considering unexplained risk for specific populations, places and times can reveal the signature of unidentified risk factors and risk factors not fully accounted for in the case-control study. This potentially can lead to new hypotheses regarding disease causation. Global, local and focused Q-statistics are applied to data from a population-based case-control study of 11 southeast Michigan counties. Analyses were conducted using both year- and age-based measures of time. The analyses were adjusted for arsenic exposure, education, smoking, family history of bladder cancer, occupational exposure to bladder cancer carcinogens, age, gender, and race. Significant global clustering of cases was not found. Such a finding would indicate large-scale clustering of cases relative to controls through time. However, highly significant local clusters were found in Ingham County near Lansing, in Oakland County, and in the City of Jackson, Michigan. The Jackson City cluster was observed in working-ages and is thus consistent with occupational causes. The Ingham County cluster persists over time, suggesting a broad-based geographically defined exposure. Focused clusters were found for 20 industrial sites engaged in manufacturing activities associated with known or suspected bladder cancer carcinogens. Set-based tests that adjusted for multiple testing were not significant, although local clusters persisted through time and temporal trends in probability of local tests were observed. Q analyses provide a powerful tool for unpacking unexplained disease risk from case-control studies. This is particularly useful when the effect of risk factors varies spatially, through time, or through both space and time. For bladder cancer in Michigan, the next step is to investigate causal hypotheses that may explain the excess bladder cancer risk localized to areas of Oakland and Ingham counties, and to the City of Jackson.

  6. Homogeneity tests of clustered diagnostic markers with applications to the BioCycle Study

    PubMed Central

    Tang, Liansheng Larry; Liu, Aiyi; Schisterman, Enrique F.; Zhou, Xiao-Hua; Liu, Catherine Chun-ling

    2014-01-01

    Diagnostic trials often require the use of a homogeneity test among several markers. Such a test may be necessary to determine the power both during the design phase and in the initial analysis stage. However, no formal method is available for the power and sample size calculation when the number of markers is greater than two and marker measurements are clustered in subjects. This article presents two procedures for testing the accuracy among clustered diagnostic markers. The first procedure is a test of homogeneity among continuous markers based on a global null hypothesis of the same accuracy. The result under the alternative provides the explicit distribution for the power and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on weighted areas under the receiver operating characteristic curves. This test is particularly useful if a global difference among markers is found by the homogeneity test. We apply our procedures to the BioCycle Study designed to assess and compare the accuracy of hormone and oxidative stress markers in distinguishing women with ovulatory menstrual cycles from those without. PMID:22733707

  7. A Direct Comparison of Two Densely Sampled HIV Epidemics: The UK and Switzerland

    NASA Astrophysics Data System (ADS)

    Ragonnet-Cronin, Manon L.; Shilaih, Mohaned; Günthard, Huldrych F.; Hodcroft, Emma B.; Böni, Jürg; Fearnhill, Esther; Dunn, David; Yerly, Sabine; Klimkait, Thomas; Aubert, Vincent; Yang, Wan-Lin; Brown, Alison E.; Lycett, Samantha J.; Kouyos, Roger; Brown, Andrew J. Leigh

    2016-09-01

    Phylogenetic clustering approaches can elucidate HIV transmission dynamics. Comparisons across countries are essential for evaluating public health policies. Here, we used a standardised approach to compare the UK HIV Drug Resistance Database and the Swiss HIV Cohort Study while maintaining data-protection requirements. Clusters were identified in subtype A1, B and C pol phylogenies. We generated degree distributions for each risk group and compared distributions between countries using Kolmogorov-Smirnov (KS) tests, Degree Distribution Quantification and Comparison (DDQC) and bootstrapping. We used logistic regression to predict cluster membership based on country, sampling date, risk group, ethnicity and sex. We analysed >8,000 Swiss and >30,000 UK subtype B sequences. At 4.5% genetic distance, the UK was more clustered and MSM and heterosexual degree distributions differed significantly by the KS test. The KS test is sensitive to variation in network scale, and jackknifing the UK MSM dataset to the size of the Swiss dataset removed the difference. Only heterosexuals varied based on the DDQC, due to UK male heterosexuals who clustered exclusively with MSM. Their removal eliminated this difference. In conclusion, the UK and Swiss HIV epidemics have similar underlying dynamics and observed differences in clustering are mainly due to different population sizes.

  8. DENBRAN: A basic program for a significance test for multivariate normality of clusters from branching patterns in dendrograms

    NASA Astrophysics Data System (ADS)

    Sneath, P. H. A.

    A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.

  9. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  10. Identification of cognitive profiles among women considering BRCA1/2 testing through the utilisation of cluster analytic techniques.

    PubMed

    Roussi, Pagona; Sherman, Kerry A; Miller, Suzanne M; Hurley, Karen; Daly, Mary B; Godwin, Andrew; Buzaglo, Joanne S; Wen, Kuang-Yi

    2011-10-01

    Based on the cognitive-social health information processing model, we identified cognitive profiles of women at risk for breast and ovarian cancer. Prior to genetic counselling, participants (N = 171) completed a study questionnaire concerning their cognitive and affective responses to being at genetic risk. Using cluster analysis, four cognitive profiles were generated: (a) high perceived risk/low coping; (b) low value of screening/high expectancy of cancer; (c) moderate perceived risk/moderate efficacy of prevention/low informativeness of test result; and (d) high efficacy of prevention/high coping. The majority of women in Clusters One, Two and Three had no personal history of cancer, whereas Cluster Four consisted almost entirely of women affected with cancer. Women in Cluster One had the highest number of affected relatives and experienced higher levels of distress than women in the other three clusters. These results highlight the need to consider the psychological profile of women undergoing genetic testing when designing counselling interventions and messages.

  11. A hybrid algorithm for clustering of time series data based on affinity search technique.

    PubMed

    Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A; Shaygan, Mohammad Amin; Jalali, Alireza

    2014-01-01

    Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.

  12. A Hybrid Algorithm for Clustering of Time Series Data Based on Affinity Search Technique

    PubMed Central

    Aghabozorgi, Saeed; Ying Wah, Teh; Herawan, Tutut; Jalab, Hamid A.; Shaygan, Mohammad Amin; Jalali, Alireza

    2014-01-01

    Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets. PMID:24982966

  13. Development of a 12-Thrust Chamber Kerosene /Oxygen Primary Rocket Sub-System for an Early (1964) Air-Augmented Rocket Ground-Test System

    NASA Technical Reports Server (NTRS)

    Pryor, D.; Hyde, E. H.; Escher, W. J. D.

    1999-01-01

    Airbreathing/Rocket combined-cycle, and specifically rocket-based combined- cycle (RBCC), propulsion systems, typically employ an internal engine flow-path installed primary rocket subsystem. To achieve acceptably short mixing lengths in effecting the "air augmentation" process, a large rocket-exhaust/air interfacial mixing surface is needed. This leads, in some engine design concepts, to a "cluster" of small rocket units, suitably arrayed in the flowpath. To support an early (1964) subscale ground-test of a specific RBCC concept, such a 12-rocket cluster was developed by NASA's Marshall Space Flight Center (MSFC). The small primary rockets used in the cluster assembly were modified versions of an existing small kerosene/oxygen water-cooled rocket engine unit routinely tested at MSFC. Following individual thrust-chamber tests and overall subsystem qualification testing, the cluster assembly was installed at the U. S. Air Force's Arnold Engineering Development Center (AEDC) for RBCC systems testing. (The results of the special air-augmented rocket testing are not covered here.) While this project was eventually successfully completed, a number of hardware integration problems were met, leading to catastrophic thrust chamber failures. The principal "lessons learned" in conducting this early primary rocket subsystem experimental effort are documented here as a basic knowledge-base contribution for the benefit of today's RBCC research and development community.

  14. Clustering and variable selection in the presence of mixed variable types and missing data.

    PubMed

    Storlie, C B; Myers, S M; Katusic, S K; Weaver, A L; Voigt, R G; Croarkin, P E; Stoeckel, R E; Port, J D

    2018-05-17

    We consider the problem of model-based clustering in the presence of many correlated, mixed continuous, and discrete variables, some of which may have missing values. Discrete variables are treated with a latent continuous variable approach, and the Dirichlet process is used to construct a mixture model with an unknown number of components. Variable selection is also performed to identify the variables that are most influential for determining cluster membership. The work is motivated by the need to cluster patients thought to potentially have autism spectrum disorder on the basis of many cognitive and/or behavioral test scores. There are a modest number of patients (486) in the data set along with many (55) test score variables (many of which are discrete valued and/or missing). The goal of the work is to (1) cluster these patients into similar groups to help identify those with similar clinical presentation and (2) identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing. The proposed approach compares very favorably with other methods via simulation of problems of this type. The results of the autism spectrum disorder analysis suggested 3 clusters to be most likely, while only 4 test scores had high (>0.5) posterior probability of being informative. This will result in much more efficient and informative testing. The need to cluster observations on the basis of many correlated, continuous/discrete variables with missing values is a common problem in the health sciences as well as in many other disciplines. Copyright © 2018 John Wiley & Sons, Ltd.

  15. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    PubMed

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  16. Community-based intermittent mass testing and treatment for malaria in an area of high transmission intensity, western Kenya: study design and methodology for a cluster randomized controlled trial.

    PubMed

    Samuels, Aaron M; Awino, Nobert; Odongo, Wycliffe; Abong'o, Benard; Gimnig, John; Otieno, Kephas; Shi, Ya Ping; Were, Vincent; Allen, Denise Roth; Were, Florence; Sang, Tony; Obor, David; Williamson, John; Hamel, Mary J; Patrick Kachur, S; Slutsker, Laurence; Lindblade, Kim A; Kariuki, Simon; Desai, Meghna

    2017-06-07

    Most human Plasmodium infections in western Kenya are asymptomatic and are believed to contribute importantly to malaria transmission. Elimination of asymptomatic infections requires active treatment approaches, such as mass testing and treatment (MTaT) or mass drug administration (MDA), as infected persons do not seek care for their infection. Evaluations of community-based approaches that are designed to reduce malaria transmission require careful attention to study design to ensure that important effects can be measured accurately. This manuscript describes the study design and methodology of a cluster-randomized controlled trial to evaluate a MTaT approach for malaria transmission reduction in an area of high malaria transmission. Ten health facilities in western Kenya were purposively selected for inclusion. The communities within 3 km of each health facility were divided into three clusters of approximately equal population size. Two clusters around each health facility were randomly assigned to the control arm, and one to the intervention arm. Three times per year for 2 years, after the long and short rains, and again before the long rains, teams of community health volunteers visited every household within the intervention arm, tested all consenting individuals with malaria rapid diagnostic tests, and treated all positive individuals with an effective anti-malarial. The effect of mass testing and treatment on malaria transmission was measured through population-based longitudinal cohorts, outpatient visits for clinical malaria, periodic population-based cross-sectional surveys, and entomological indices.

  17. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    NASA Astrophysics Data System (ADS)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  18. Testing the Archivas Cluster (Arc) for Ozone Monitoring Instrument (OMI) Scientific Data Storage

    NASA Technical Reports Server (NTRS)

    Tilmes, Curt

    2005-01-01

    The Ozone Monitoring Instrument (OMI) launched on NASA's Aura Spacecraft, the third of the major platforms of the EOS program on July 15,2004. In addition to the long term archive and distribution of the data from OM1 through the Goddard Earth Science Distributed Active Archive Center (GESDAAC), we are evaluating other archive mechanisms that can archive the data in a more immediately available method where it can be used for futher data production and analysis. In 2004, Archivas, Inc. was selected by NASA s Small Business Innovative Research (SBIR) program for the development of their Archivas Cluster (ArC) product. Arc is an online disk based system utilizing self-management and automation on a Linux cluster. Its goal is to produce a low cost solution coupled with the ease of management. The OM1 project is an application partner of the SBIR program, and has deployed a small cluster (5TB) based on the beta Archwas software. We performed extensive testing of the unit using production OM1 data since launch. In 2005, Archivas, Inc. was funded in SBIR Phase II for further development, which will include testing scalability with the deployment of a larger (35TB) cluster at Goddard. We plan to include Arc in the OM1 Team Leader Computing Facility (TLCF) hosting OM1 data for direct access and analysis by the OMI Science Team. This presentation will include a brief technical description of the Archivas Cluster, a summary of the SBIR Phase I beta testing results, and an overview of the OMI ground data processing architecture including its interaction with the Phase II Archivas Cluster and hosting of OMI data for the scientists.

  19. Cosmological constraints from strong gravitational lensing in clusters of galaxies.

    PubMed

    Jullo, Eric; Natarajan, Priyamvada; Kneib, Jean-Paul; D'Aloisio, Anson; Limousin, Marceau; Richard, Johan; Schimd, Carlo

    2010-08-20

    Current efforts in observational cosmology are focused on characterizing the mass-energy content of the universe. We present results from a geometric test based on strong lensing in galaxy clusters. Based on Hubble Space Telescope images and extensive ground-based spectroscopic follow-up of the massive galaxy cluster Abell 1689, we used a parametric model to simultaneously constrain the cluster mass distribution and dark energy equation of state. Combining our cosmological constraints with those from x-ray clusters and the Wilkinson Microwave Anisotropy Probe 5-year data gives Omega(m) = 0.25 +/- 0.05 and w(x) = -0.97 +/- 0.07, which are consistent with results from other methods. Inclusion of our method with all other available techniques brings down the current 2sigma contours on the dark energy equation-of-state parameter w(x) by approximately 30%.

  20. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  1. Detailed analysis of the supermarket task included on the Japanese version of the Rapid Dementia Screening Test.

    PubMed

    Moriyama, Yasushi; Yoshino, Aihide; Muramatsu, Taro; Mimura, Masaru

    2017-05-01

    The supermarket task, which is included in the Japanese version of the Rapid Dementia Screening Test, requires the quick (1 min) generation of words for things that can be bought in a supermarket. Cluster size and switches are investigated during this task. We investigated how the severity of dementia related to cluster size and switches on the supermarket task in patients with Alzheimer's disease. We administered the Japanese version of the Rapid Dementia Screening Test to 250 patients with very mild to severe Alzheimer's disease and to 49 healthy volunteers. Patients had Mini-Mental State Examination scores from 12 to 26 and Clinical Dementia Rating scale scores from 0.5 to 3. Patients were divided into four groups based on their Clinical Dementia Rating score (0.5, 1, 2, 3). We performed statistical analyses between the four groups and control subjects based on cluster size and switch scores on the supermarket task. The score for cluster size and switches deteriorated according to the severity of dementia. Moreover, for subjects with a Clinical Dementia Rating score of 0.5, cluster size was impaired, but switches were intact. Our findings indicate that the scores for cluster size and switches on the supermarket task may be useful for detecting the severity of symptoms of dementia in patients with Alzheimer's disease. © 2016 The Authors. Psychogeriatrics © 2016 Japanese Psychogeriatric Society.

  2. A hierarchical clustering methodology for the estimation of toxicity.

    PubMed

    Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M

    2008-01-01

    ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.

  3. Description of alternating-parity bands within the dinuclear-system model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shneidman, T. M.; Adamian, G. G., E-mail: adamian@theor.jinr.ru; Antonenko, N. V.

    2016-11-15

    A cluster approach is used to describe ground-state-based alternating-parity bands in even–even nuclei and to study the band-termination mechanism. A method is proposed for testing the cluster nature of alternating-parity bands.

  4. Integrated cluster- and case-based surveillance for detecting stage III zoonotic pathogens: an example of Nipah virus surveillance in Bangladesh.

    PubMed

    Naser, A M; Hossain, M J; Sazzad, H M S; Homaira, N; Gurley, E S; Podder, G; Afroj, S; Banu, S; Rollin, P E; Daszak, P; Ahmed, B-N; Rahman, M; Luby, S P

    2015-07-01

    This paper explores the utility of cluster- and case-based surveillance established in government hospitals in Bangladesh to detect Nipah virus, a stage III zoonotic pathogen. Physicians listed meningo-encephalitis cases in the 10 surveillance hospitals and identified a cluster when ⩾2 cases who lived within 30 min walking distance of one another developed symptoms within 3 weeks of each other. Physicians collected blood samples from the clustered cases. As part of case-based surveillance, blood was collected from all listed meningo-encephalitis cases in three hospitals during the Nipah season (January-March). An investigation team visited clustered cases' communities to collect epidemiological information and blood from the living cases. We tested serum using Nipah-specific IgM ELISA. Up to September 2011, in 5887 listed cases, we identified 62 clusters comprising 176 encephalitis cases. We collected blood from 127 of these cases. In 10 clusters, we identified a total of 62 Nipah cases: 18 laboratory-confirmed and 34 probable. We identified person-to-person transmission of Nipah virus in four clusters. From case-based surveillance, we identified 23 (4%) Nipah cases. Faced with thousands of encephalitis cases, integrated cluster surveillance allows targeted deployment of investigative resources to detect outbreaks by stage III zoonotic pathogens in resource-limited settings.

  5. [Predicting Incidence of Hepatitis E in Chinausing Fuzzy Time Series Based on Fuzzy C-Means Clustering Analysis].

    PubMed

    Luo, Yi; Zhang, Tao; Li, Xiao-song

    2016-05-01

    To explore the application of fuzzy time series model based on fuzzy c-means clustering in forecasting monthly incidence of Hepatitis E in mainland China. Apredictive model (fuzzy time series method based on fuzzy c-means clustering) was developed using Hepatitis E incidence data in mainland China between January 2004 and July 2014. The incidence datafrom August 2014 to November 2014 were used to test the fitness of the predictive model. The forecasting results were compared with those resulted from traditional fuzzy time series models. The fuzzy time series model based on fuzzy c-means clustering had 0.001 1 mean squared error (MSE) of fitting and 6.977 5 x 10⁻⁴ MSE of forecasting, compared with 0.0017 and 0.0014 from the traditional forecasting model. The results indicate that the fuzzy time series model based on fuzzy c-means clustering has a better performance in forecasting incidence of Hepatitis E.

  6. Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.

    PubMed

    Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J

    2016-04-01

    Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  7. Restricted random search method based on taboo search in the multiple minima problem

    NASA Astrophysics Data System (ADS)

    Hong, Seung Do; Jhon, Mu Shik

    1997-03-01

    The restricted random search method is proposed as a simple Monte Carlo sampling method to search minima fast in the multiple minima problem. This method is based on taboo search applied recently to continuous test functions. The concept of the taboo region instead of the taboo list is used and therefore the sampling of a region near an old configuration is restricted in this method. This method is applied to 2-dimensional test functions and the argon clusters. This method is found to be a practical and efficient method to search near-global configurations of test functions and the argon clusters.

  8. Speckle reduction of OCT images using an adaptive cluster-based filtering

    NASA Astrophysics Data System (ADS)

    Adabi, Saba; Rashedi, Elaheh; Conforto, Silvia; Mehregan, Darius; Xu, Qiuyun; Nasiriavanaki, Mohammadreza

    2017-02-01

    Optical coherence tomography (OCT) has become a favorable device in the dermatology discipline due to its moderate resolution and penetration depth. OCT images however contain grainy pattern, called speckle, due to the broadband source that has been used in the configuration of OCT. So far, a variety of filtering techniques is introduced to reduce speckle in OCT images. Most of these methods are generic and can be applied to OCT images of different tissues. In this paper, we present a method for speckle reduction of OCT skin images. Considering the architectural structure of skin layers, it seems that a skin image can benefit from being segmented in to differentiable clusters, and being filtered separately in each cluster by using a clustering method and filtering methods such as Wiener. The proposed algorithm was tested on an optical solid phantom with predetermined optical properties. The algorithm was also tested on healthy skin images. The results show that the cluster-based filtering method can reduce the speckle and increase the signal-to-noise ratio and contrast while preserving the edges in the image.

  9. Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph Theory Clustering Method: Virginia as a Case Study

    PubMed Central

    Nolan, Jim

    2014-01-01

    This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area. PMID:24778585

  10. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  11. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  12. Zonation in the deep benthic megafauna : Application of a general test.

    PubMed

    Gardiner, Frederick P; Haedrich, Richard L

    1978-01-01

    A test based on Maxwell-Boltzman statistics, instead of the formerly suggested but inappropriate Bose-Einstein statistics (Pielou and Routledge, 1976), examines the distribution of the boundaries of species' ranges distributed along a gradient, and indicates whether they are random or clustered (zoned). The test is most useful as a preliminary to the application of more instructive but less statistically rigorous methods such as cluster analysis. The test indicates zonation is marked in the deep benthic megafauna living between 200 and 3000 m, but below 3000 m little zonation may be found.

  13. Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system

    NASA Astrophysics Data System (ADS)

    Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

    2015-01-01

    This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.

  14. A spatial scan statistic for compound Poisson data.

    PubMed

    Rosychuk, Rhonda J; Chang, Hsing-Ming

    2013-12-20

    The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.

  15. Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering.

    PubMed

    Liu, Ying; Ciliax, Brian J; Borges, Karin; Dasigi, Venu; Ram, Ashwin; Navathe, Shamkant B; Dingledine, Ray

    2004-01-01

    One of the key challenges of microarray studies is to derive biological insights from the unprecedented quatities of data on gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the nature of the functional links among genes within the derived clusters. However, the quality of the keyword lists extracted from biomedical literature for each gene significantly affects the clustering results. We extracted keywords from MEDLINE that describes the most prominent functions of the genes, and used the resulting weights of the keywords as feature vectors for gene clustering. By analyzing the resulting cluster quality, we compared two keyword weighting schemes: normalized z-score and term frequency-inverse document frequency (TFIDF). The best combination of background comparison set, stop list and stemming algorithm was selected based on precision and recall metrics. In a test set of four known gene groups, a hierarchical algorithm correctly assigned 25 of 26 genes to the appropriate clusters based on keywords extracted by the TDFIDF weighting scheme, but only 23 og 26 with the z-score method. To evaluate the effectiveness of the weighting schemes for keyword extraction for gene clusters from microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle were used as a second test set. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords had higher purity, lower entropy, and higher mutual information than those produced from normalized z-score weighted keywords. The optimized algorithms should be useful for sorting genes from microarray lists into functionally discrete clusters.

  16. Role of binding entropy in the refinement of protein-ligand docking predictions: analysis based on the use of 11 scoring functions.

    PubMed

    Ruvinsky, Anatoly M

    2007-06-01

    We present results of testing the ability of eleven popular scoring functions to predict native docked positions using a recently developed method (Ruvinsky and Kozintsev, J Comput Chem 2005, 26, 1089) for estimation the entropy contributions of relative motions to protein-ligand binding affinity. The method is based on the integration of the configurational integral over clusters obtained from multiple docked positions. We use a test set of 100 PDB protein-ligand complexes and ensembles of 101 docked positions generated by (Wang et al. J Med Chem 2003, 46, 2287) for each ligand in the test set. To test the suggested method we compared the averaged root-mean square deviations (RMSD) of the top-scored ligand docked positions, accounting and not accounting for entropy contributions, relative to the experimentally determined positions. We demonstrate that the method increases docking accuracy by 10-21% when used in conjunction with the AutoDock scoring function, by 2-25% with G-Score, by 7-41% with D-Score, by 0-8% with LigScore, by 1-6% with PLP, by 0-12% with LUDI, by 2-8% with F-Score, by 7-29% with ChemScore, by 0-9% with X-Score, by 2-19% with PMF, and by 1-7% with DrugScore. We also compared the performance of the suggested method with the method based on ranking by cluster occupancy only. We analyze how the choice of a clustering-RMSD and a low bound of dense clusters impacts on docking accuracy of the scoring methods. We derive optimal intervals of the clustering-RMSD for 11 scoring functions.

  17. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    PubMed Central

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-01-01

    Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124

  18. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  19. Finding clusters of similar events within clinical incident reports: a novel methodology combining case based reasoning and information retrieval

    PubMed Central

    Tsatsoulis, C; Amthauer, H

    2003-01-01

    A novel methodological approach for identifying clusters of similar medical incidents by analyzing large databases of incident reports is described. The discovery of similar events allows the identification of patterns and trends, and makes possible the prediction of future events and the establishment of barriers and best practices. Two techniques from the fields of information science and artificial intelligence have been integrated—namely, case based reasoning and information retrieval—and very good clustering accuracies have been achieved on a test data set of incident reports from transfusion medicine. This work suggests that clustering should integrate the features of an incident captured in traditional form based records together with the detailed information found in the narrative included in event reports. PMID:14645892

  20. Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.

    PubMed

    Itoh, Takayuki; Klein, Karsten

    2015-01-01

    Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.

  1. Exploiting Defect Clustering to Screen Bare Die for Infant Mortality Failure: An Experimental Study

    NASA Technical Reports Server (NTRS)

    Lakin, David R., II; Singh, Adit D.

    1999-01-01

    We present the first experimental results to establish that a binning strategy based on defect clustering can be used to screen bare die for early life failures. The data for this study comes from the SEMATECH test methods experiment.

  2. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  3. Exploratory Item Classification Via Spectral Graph Clustering

    PubMed Central

    Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2017-01-01

    Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476

  4. Targeting regional pediatric congenital hearing loss using a spatial scan statistic.

    PubMed

    Bush, Matthew L; Christian, Warren Jay; Bianchi, Kristin; Lester, Cathy; Schoenberg, Nancy

    2015-01-01

    Congenital hearing loss is a common problem, and timely identification and intervention are paramount for language development. Patients from rural regions may have many barriers to timely diagnosis and intervention. The purpose of this study was to examine the spatial and hospital-based distribution of failed infant hearing screening testing and pediatric congenital hearing loss throughout Kentucky. Data on live births and audiological reporting of infant hearing loss results in Kentucky from 2009 to 2011 were analyzed. The authors used spatial scan statistics to identify high-rate clusters of failed newborn screening tests and permanent congenital hearing loss (PCHL), based on the total number of live births per county. The authors conducted further analyses on PCHL and failed newborn hearing screening tests, based on birth hospital data and method of screening. The authors observed four statistically significant (p < 0.05) high-rate clusters with failed newborn hearing screenings in Kentucky, including two in the Appalachian region. Hospitals using two-stage otoacoustic emission testing demonstrated higher rates of failed screening (p = 0.009) than those using two-stage automated auditory brainstem response testing. A significant cluster of high rate of PCHL was observed in Western Kentucky. Five of the 54 birthing hospitals were found to have higher relative risk of PCHL, and two of those hospitals are located in a very rural region of Western Kentucky within the cluster. This spatial analysis in children in Kentucky has identified specific regions throughout the state with high rates of congenital hearing loss and failed newborn hearing screening tests. Further investigation regarding causative factors is warranted. This method of analysis can be useful in the setting of hearing health disparities to focus efforts on regions facing high incidence of congenital hearing loss.

  5. Semantic Clustering of Search Engine Results

    PubMed Central

    Soliman, Sara Saad; El-Sayed, Maged F.; Hassan, Yasser F.

    2015-01-01

    This paper presents a novel approach for search engine results clustering that relies on the semantics of the retrieved documents rather than the terms in those documents. The proposed approach takes into consideration both lexical and semantics similarities among documents and applies activation spreading technique in order to generate semantically meaningful clusters. This approach allows documents that are semantically similar to be clustered together rather than clustering documents based on similar terms. A prototype is implemented and several experiments are conducted to test the prospered solution. The result of the experiment confirmed that the proposed solution achieves remarkable results in terms of precision. PMID:26933673

  6. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  7. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  8. The effect of clustering on perceived quantity in humans (Homo sapiens) and in chicks (Gallus gallus).

    PubMed

    Bertamini, Marco; Guest, Martin; Vallortigara, Giorgio; Rugani, Rosa; Regolin, Lucia

    2018-04-30

    Animals can perceive the numerosity of sets of visual elements. Qualitative and quantitative similarities in different species suggest the existence of a shared system (approximate number system). Biases associated with sensory properties are informative about the underlying mechanisms. In humans, regular spacing increases perceived numerosity (regular-random numerosity illusion). This has led to a model that predicts numerosity based on occupancy (a measure that decreases when elements are close together). We used a procedure in which observers selected one of two stimuli and were given feedback with respect to whether the choice was correct. One configuration had 20 elements and the other 40, randomly placed inside a circular region. Participants had to discover the rule based on feedback. Because density and clustering covaried with numerosity, different dimensions could be used. After reaching a criterion, test trials presented two types of configurations with 30 elements. One type had a larger interelement distance than the other (high or low clustering). If observers had adopted a numerosity strategy, they would choose low clustering (if reinforced with 40) and high clustering (if reinforced with 20). A clustering or density strategy predicts the opposite. Human adults used a numerosity strategy. Chicks were tested using a similar procedure. There were two behavioral measures: first approach response and final circumnavigation (walking behind the screen). The prediction based on numerosity was confirmed by the first approach data. For chicks, one clear pattern from both responses was a preference for the configurations with higher clustering. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  10. BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

    PubMed Central

    Heidelberg, John F.; Tully, Benjamin J.

    2017-01-01

    Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes. PMID:28289564

  11. D Geomarketing Segmentation: a Higher Spatial Dimension Planning Perspective

    NASA Astrophysics Data System (ADS)

    Suhaibah, A.; Uznir, U.; Rahman, A. A.; Anton, F.; Mioc, D.

    2016-09-01

    Geomarketing is a discipline which uses geographic information in the process of planning and implementation of marketing activities. It can be used in any aspect of the marketing such as price, promotion or geo targeting. The analysis of geomarketing data use a huge data pool such as location residential areas, topography, it also analyzes demographic information such as age, genre, annual income and lifestyle. This information can help users to develop successful promotional campaigns in order to achieve marketing goals. One of the common activities in geomarketing is market segmentation. The segmentation clusters the data into several groups based on its geographic criteria. To refine the search operation during analysis, we proposed an approach to cluster the data using a clustering algorithm. However, with the huge data pool, overlap among clusters may happen and leads to inefficient analysis. Moreover, geomarketing is usually active in urban areas and requires clusters to be organized in a three-dimensional (3D) way (i.e. multi-level shop lots, residential apartments). This is a constraint with the current Geographic Information System (GIS) framework. To avoid this issue, we proposed a combination of market segmentation based on geographic criteria and clustering algorithm for 3D geomarketing data management. The proposed approach is capable in minimizing the overlap region during market segmentation. In this paper, geomarketing in urban area is used as a case study. Based on the case study, several locations of customers and stores in 3D are used in the test. The experiments demonstrated in this paper substantiated that the proposed approach is capable of minimizing overlapping segmentation and reducing repetitive data entries. The structure is also tested for retrieving the spatial records from the database. For marketing purposes, certain radius of point is used to analyzing marketing targets. Based on the presented tests in this paper, we strongly believe that the structure is capable in handling and managing huge pool of geomarketing data. For future outlook, this paper also discusses the possibilities of expanding the structure.

  12. Home-based versus mobile clinic HIV testing and counseling in rural Lesotho: a cluster-randomized trial.

    PubMed

    Labhardt, Niklaus Daniel; Motlomelo, Masetsibi; Cerutti, Bernard; Pfeiffer, Karolin; Kamele, Mashaete; Hobbins, Michael A; Ehmer, Jochen

    2014-12-01

    The success of HIV programs relies on widely accessible HIV testing and counseling (HTC) services at health facilities as well as in the community. Home-based HTC (HB-HTC) is a popular community-based approach to reach persons who do not test at health facilities. Data comparing HB-HTC to other community-based HTC approaches are very limited. This trial compares HB-HTC to mobile clinic HTC (MC-HTC). The trial was powered to test the hypothesis of higher HTC uptake in HB-HTC campaigns than in MC-HTC campaigns. Twelve clusters were randomly allocated to HB-HTC or MC-HTC. The six clusters in the HB-HTC group received 30 1-d multi-disease campaigns (five villages per cluster) that delivered services by going door-to-door, whereas the six clusters in MC-HTC group received campaigns involving community gatherings in the 30 villages with subsequent service provision in mobile clinics. Time allocation and human resources were standardized and equal in both groups. All individuals accessing the campaigns with unknown HIV status or whose last HIV test was >12 wk ago and was negative were eligible. All outcomes were assessed at the individual level. Statistical analysis used multivariable logistic regression. Odds ratios and p-values were adjusted for gender, age, and cluster effect. Out of 3,197 participants from the 12 clusters, 2,563 (80.2%) were eligible (HB-HTC: 1,171; MC-HTC: 1,392). The results for the primary outcomes were as follows. Overall HTC uptake was higher in the HB-HTC group than in the MC-HTC group (92.5% versus 86.7%; adjusted odds ratio [aOR]: 2.06; 95% CI: 1.18-3.60; p = 0. 011). Among adolescents and adults ≥ 12 y, HTC uptake did not differ significantly between the two groups; however, in children <12 y, HTC uptake was higher in the HB-HTC arm (87.5% versus 58.7%; aOR: 4.91; 95% CI: 2.41-10.0; p<0.001). Out of those who took up HTC, 114 (4.9%) tested HIV-positive, 39 (3.6%) in the HB-HTC arm and 75 (6.2%) in the MC-HTC arm (aOR: 0.64; 95% CI: 0.48-0.86; p = 0.002). Ten (25.6%) and 19 (25.3%) individuals in the HB-HTC and in the MC-HTC arms, respectively, linked to HIV care within 1 mo after testing positive. Findings for secondary outcomes were as follows: HB-HTC reached more first-time testers, particularly among adolescents and young adults, and had a higher proportion of men among participants. However, after adjusting for clustering, the difference in male participation was not significant anymore. Age distribution among participants and immunological and clinical stages among persons newly diagnosed HIV-positive did not differ significantly between the two groups. Major study limitations included the campaigns' restriction to weekdays and a relatively low HIV prevalence among participants, the latter indicating that both arms may have reached an underexposed population. This study demonstrates that both HB-HTC and MC-HTC can achieve high uptake of HTC. The choice between these two community-based strategies will depend on the objective of the activity: HB-HTC was better in reaching children, individuals who had never tested before, and men, while MC-HTC detected more new HIV infections. The low rate of linkage to care after a positive HIV test warrants future consideration of combining community-based HTC approaches with strategies to improve linkage to care for persons who test HIV-positive. ClinicalTrials.gov NCT01459120. Please see later in the article for the Editors' Summary.

  13. Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

    PubMed Central

    Goodpaster, Aaron M.; Kennedy, Michael A.

    2015-01-01

    Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647

  14. Significance tests for functional data with complex dependence structure.

    PubMed

    Staicu, Ana-Maria; Lahiri, Soumen N; Carroll, Raymond J

    2015-01-01

    We propose an L 2 -norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.

  15. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  16. Determining open cluster membership. A Bayesian framework for quantitative member classification

    NASA Astrophysics Data System (ADS)

    Stott, Jonathan J.

    2018-01-01

    Aims: My goal is to develop a quantitative algorithm for assessing open cluster membership probabilities. The algorithm is designed to work with single-epoch observations. In its simplest form, only one set of program images and one set of reference images are required. Methods: The algorithm is based on a two-stage joint astrometric and photometric assessment of cluster membership probabilities. The probabilities were computed within a Bayesian framework using any available prior information. Where possible, the algorithm emphasizes simplicity over mathematical sophistication. Results: The algorithm was implemented and tested against three observational fields using published survey data. M 67 and NGC 654 were selected as cluster examples while a third, cluster-free, field was used for the final test data set. The algorithm shows good quantitative agreement with the existing surveys and has a false-positive rate significantly lower than the astrometric or photometric methods used individually.

  17. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  18. A new physical performance classification system for elite handball players: cluster analysis

    PubMed Central

    Chirosa, Ignacio J.; Robinson, Joseph E.; van der Tillaar, Roland; Chirosa, Luis J.; Martín, Isidoro Martínez

    2016-01-01

    Abstract The aim of the present study was to identify different cluster groups of handball players according to their physical performance level assessed in a series of physical assessments, which could then be used to design a training program based on individual strengths and weaknesses, and to determine which of these variables best identified elite performance in a group of under-19 [U19] national level handball players. Players of the U19 National Handball team (n=16) performed a set of tests to determine: 10 m (ST10) and 20 m (ST20) sprint time, ball release velocity (BRv), countermovement jump (CMJ) height and squat jump (SJ) height. All players also performed an incremental-load bench press test to determine the 1 repetition maximum (1RMest), the load corresponding to maximum mean power (LoadMP), the mean propulsive phase power at LoadMP (PMPPMP) and the peak power at LoadMP (PPEAKMP). Cluster analyses of the test results generated four groupings of players. The variables best able to discriminate physical performance were BRv, ST20, 1RMest, PPEAKMP and PMPPMP. These variables could help coaches identify talent or monitor the physical performance of athletes in their team. Each cluster of players has a particular weakness related to physical performance and therefore, the cluster results can be applied to a specific training programmed based on individual needs. PMID:28149376

  19. Density-based cluster algorithms for the identification of core sets

    NASA Astrophysics Data System (ADS)

    Lemke, Oliver; Keller, Bettina G.

    2016-10-01

    The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.

  20. A comparison of IQ and memory cluster solutions in moderate and severe pediatric traumatic brain injury.

    PubMed

    Thaler, Nicholas S; Terranova, Jennifer; Turner, Alisa; Mayfield, Joan; Allen, Daniel N

    2015-01-01

    Recent studies have examined heterogeneous neuropsychological outcomes in childhood traumatic brain injury (TBI) using cluster analysis. These studies have identified homogeneous subgroups based on tests of IQ, memory, and other cognitive abilities that show some degree of association with specific cognitive, emotional, and behavioral outcomes, and have demonstrated that the clusters derived for children with TBI are different from those observed in normal populations. However, the extent to which these subgroups are stable across abilities has not been examined, and this has significant implications for the generalizability and clinical utility of TBI clusters. The current study addressed this by comparing IQ and memory profiles of 137 children who sustained moderate-to-severe TBI. Cluster analysis of IQ and memory scores indicated that a four-cluster solution was optimal for the IQ scores and a five-cluster solution was optimal for the memory scores. Three clusters on each battery differed primarily by level of performance, while the others had pattern variations. Cross-plotting the clusters across respective IQ and memory test scores indicated that clusters defined by level were generally stable, while clusters defined by pattern differed. Notably, children with slower processing speed exhibited low-average to below-average performance on memory indexes. These results provide some support for the stability of previously identified memory and IQ clusters and provide information about the relationship between IQ and memory in children with TBI.

  1. Strategic Development for Middle School Students Struggling With Fractions: Assessment and Intervention.

    PubMed

    Zhang, Dake; Stecker, Pamela; Huckabee, Sloan; Miller, Rhonda

    2016-09-01

    Research has suggested that different strategies used when solving fraction problems are highly correlated with students' problem-solving accuracy. This study (a) utilized latent profile modeling to classify students into three different strategic developmental levels in solving fraction comparison problems and (b) accordingly provided differentiated strategic training for students starting from two different strategic developmental levels. In Study 1 we assessed 49 middle school students' performance on fraction comparison problems and categorized students into three clusters of strategic developmental clusters: a cross-multiplication cluster with the highest accuracy, a representation strategy cluster with medium accuracy, and a whole-number strategy cluster with the lowest accuracy. Based on the strategic developmental levels identified in Study 1, in Study 2 we selected three students from the whole-number strategy cluster and another three students from the representation strategy cluster and implemented a differentiated strategic training intervention within a multiple-baseline design. Results showed that both groups of students transitioned from less advanced to more advanced strategies and improved their problem-solving accuracy during the posttest, the maintenance test, and the generalization test. © Hammill Institute on Disabilities 2014.

  2. LoCuSS: Testing hydrostatic equilibrium in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Smith, G. P.; Mazzotta, P.; Okabe, N.; Ziparo, F.; Mulroy, S. L.; Babul, A.; Finoguenov, A.; McCarthy, I. G.; Lieu, M.; Bahé, Y. M.; Bourdin, H.; Evrard, A. E.; Futamase, T.; Haines, C. P.; Jauzac, M.; Marrone, D. P.; Martino, R.; May, P. E.; Taylor, J. E.; Umetsu, K.

    2016-02-01

    We test the assumption of hydrostatic equilibrium in an X-ray luminosity selected sample of 50 galaxy clusters at 0.15 < z < 0.3 from the Local Cluster Substructure Survey (LoCuSS). Our weak-lensing measurements of M500 control systematic biases to sub-4 per cent, and our hydrostatic measurements of the same achieve excellent agreement between XMM-Newton and Chandra. The mean ratio of X-ray to lensing mass for these 50 clusters is β_X= 0.95± 0.05, and for the 44 clusters also detected by Planck, the mean ratio of Planck mass estimate to LoCuSS lensing mass is β_P= 0.95± 0.04. Based on a careful like-for-like analysis, we find that LoCuSS, the Canadian Cluster Comparison Project, and Weighing the Giants agree on β_P ≃ 0.9-0.95 at 0.15 < z < 0.3. This small level of hydrostatic bias disagrees at ˜5σ with the level required to reconcile Planck cosmology results from the cosmic microwave background and galaxy cluster counts.

  3. Fitness as a determinant of arterial stiffness in healthy adult men: a cross-sectional study.

    PubMed

    Chung, Jinwook; Kim, Milyang; Jin, Youngsoo; Kim, Yonghwan; Hong, Jeeyoung

    2018-01-01

    Fitness is known to influence arterial stiffness. This study aimed to assess differences in cardiorespiratory endurance, muscular strength, and flexibility according to arterial stiffness, based on sex and age. We enrolled 1590 healthy adults (men: 1242, women: 348) who were free of metabolic syndrome. We measured cardiorespiratory endurance in an exercise stress test on a treadmill, muscular strength by a grip test, and flexibility by upper body forward-bends from a standing position. The brachial-ankle pulse wave velocity test was performed to measure arterial stiffness before the fitness test. Cluster analysis was performed to divide the patients into groups with low (Cluster 1) and high (Cluster 2) arterial stiffness. According to the k-cluster analysis results, Cluster 1 included 624 men and 180 women, and Cluster 2 included 618 men and 168 women. Men in the middle-aged group with low arterial stiffness demonstrated higher cardiorespiratory endurance, muscular strength, and flexibility than those with high arterial stiffness. Similarly, among men in the old-aged group, the cardiorespiratory endurance and muscular strength, but not flexibility, differed significantly according to arterial stiffness. Women in both clusters showed similar cardiorespiratory endurance, muscular strength, and flexibility regardless of their arterial stiffness. Among healthy adults, arterial stiffness was inversely associated with fitness in men but not in women. Therefore, fitness seems to be a determinant for arterial stiffness in men. Additionally, regular exercise should be recommended for middle-aged men to prevent arterial stiffness.

  4. Using background knowledge for picture organization and retrieval

    NASA Astrophysics Data System (ADS)

    Quintana, Yuri

    1997-01-01

    A picture knowledge base management system is described that is used to represent, organize and retrieve pictures from a frame knowledge base. Experiments with human test subjects were conducted to obtain further descriptions of pictures from news magazines. These descriptions were used to represent the semantic content of pictures in frame representations. A conceptual clustering algorithm is described which organizes pictures not only on the observable features, but also on implicit properties derived from the frame representations. The algorithm uses inheritance reasoning to take into account background knowledge in the clustering. The algorithm creates clusters of pictures using a group similarity function that is based on the gestalt theory of picture perception. For each cluster created, a frame is generated which describes the semantic content of pictures in the cluster. Clustering and retrieval experiments were conducted with and without background knowledge. The paper shows how the use of background knowledge and semantic similarity heuristics improves the speed, precision, and recall of queries processed. The paper concludes with a discussion of how natural language processing of can be used to assist in the development of knowledge bases and the processing of user queries.

  5. Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

    NASA Technical Reports Server (NTRS)

    Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

    1999-01-01

    We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

  6. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    PubMed Central

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  7. Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods

    NASA Technical Reports Server (NTRS)

    Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan

    2016-01-01

    The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.

  8. Bearing performance degradation assessment based on a combination of empirical mode decomposition and k-medoids clustering

    NASA Astrophysics Data System (ADS)

    Rai, Akhand; Upadhyay, S. H.

    2017-09-01

    Bearing is the most critical component in rotating machinery since it is more susceptible to failure. The monitoring of degradation in bearings becomes of great concern for averting the sudden machinery breakdown. In this study, a novel method for bearing performance degradation assessment (PDA) based on an amalgamation of empirical mode decomposition (EMD) and k-medoids clustering is encouraged. The fault features are extracted from the bearing signals using the EMD process. The extracted features are then subjected to k-medoids based clustering for obtaining the normal state and failure state cluster centres. A confidence value (CV) curve based on dissimilarity of the test data object to the normal state is obtained and employed as the degradation indicator for assessing the health of bearings. The proposed outlook is applied on the vibration signals collected in run-to-failure tests of bearings to assess its effectiveness in bearing PDA. To validate the superiority of the suggested approach, it is compared with commonly used time-domain features RMS and kurtosis, well-known fault diagnosis method envelope analysis (EA) and existing PDA classifiers i.e. self-organizing maps (SOM) and Fuzzy c-means (FCM). The results demonstrate that the recommended method outperforms the time-domain features, SOM and FCM based PDA in detecting the early stage degradation more precisely. Moreover, EA can be used as an accompanying method to confirm the early stage defect detected by the proposed bearing PDA approach. The study shows the potential application of k-medoids clustering as an effective tool for PDA of bearings.

  9. Recognizing patterns of visual field loss using unsupervised machine learning

    NASA Astrophysics Data System (ADS)

    Yousefi, Siamak; Goldbaum, Michael H.; Zangwill, Linda M.; Medeiros, Felipe A.; Bowd, Christopher

    2014-03-01

    Glaucoma is a potentially blinding optic neuropathy that results in a decrease in visual sensitivity. Visual field abnormalities (decreased visual sensitivity on psychophysical tests) are the primary means of glaucoma diagnosis. One form of visual field testing is Frequency Doubling Technology (FDT) that tests sensitivity at 52 points within the visual field. Like other psychophysical tests used in clinical practice, FDT results yield specific patterns of defect indicative of the disease. We used Gaussian Mixture Model with Expectation Maximization (GEM), (EM is used to estimate the model parameters) to automatically separate FDT data into clusters of normal and abnormal eyes. Principal component analysis (PCA) was used to decompose each cluster into different axes (patterns). FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal (i.e., glaucomatous) FDT results, recruited from a university-based, longitudinal, multi-center, clinical study on glaucoma. The GEM input was the 52-point FDT threshold sensitivities for all eyes. The optimal GEM model separated the FDT fields into 3 clusters. Cluster 1 contained 94% normal fields (94% specificity) and clusters 2 and 3 combined, contained 77% abnormal fields (77% sensitivity). For clusters 1, 2 and 3 the optimal number of PCA-identified axes were 2, 2 and 5, respectively. GEM with PCA successfully separated FDT fields from healthy and glaucoma eyes and identified familiar glaucomatous patterns of loss.

  10. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    PubMed

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. The ToxBank Data Warehouse: Supporting the Replacement of In Vivo Repeated Dose Systemic Toxicity Testing.

    PubMed

    Kohonen, Pekka; Benfenati, Emilio; Bower, David; Ceder, Rebecca; Crump, Michael; Cross, Kevin; Grafström, Roland C; Healy, Lyn; Helma, Christoph; Jeliazkova, Nina; Jeliazkov, Vedrin; Maggioni, Silvia; Miller, Scott; Myatt, Glenn; Rautenberg, Michael; Stacey, Glyn; Willighagen, Egon; Wiseman, Jeff; Hardy, Barry

    2013-01-01

    The aim of the SEURAT-1 (Safety Evaluation Ultimately Replacing Animal Testing-1) research cluster, comprised of seven EU FP7 Health projects co-financed by Cosmetics Europe, is to generate a proof-of-concept to show how the latest technologies, systems toxicology and toxicogenomics can be combined to deliver a test replacement for repeated dose systemic toxicity testing on animals. The SEURAT-1 strategy is to adopt a mode-of-action framework to describe repeated dose toxicity, combining in vitro and in silico methods to derive predictions of in vivo toxicity responses. ToxBank is the cross-cluster infrastructure project whose activities include the development of a data warehouse to provide a web-accessible shared repository of research data and protocols, a physical compounds repository, reference or "gold compounds" for use across the cluster (available via wiki.toxbank.net), and a reference resource for biomaterials. Core technologies used in the data warehouse include the ISA-Tab universal data exchange format, REpresentational State Transfer (REST) web services, the W3C Resource Description Framework (RDF) and the OpenTox standards. We describe the design of the data warehouse based on cluster requirements, the implementation based on open standards, and finally the underlying concepts and initial results of a data analysis utilizing public data related to the gold compounds. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Numerical Analysis of Base Flowfield for a Four-Engine Clustered Nozzle Configuration

    NASA Technical Reports Server (NTRS)

    Wang, Ten-See

    1995-01-01

    Excessive base heating has been a problem for many launch vehicles. For certain designs such as the direct dump of turbine exhaust inside and at the lip of the nozzle, the potential burning of the turbine exhaust in the base region can be of great concern. Accurate prediction of the base environment at altitudes is therefore very important during the vehicle design phase. Otherwise, undesirable consequences may occur. In this study, the turbulent base flowfield of a cold flow experimental investigation for a four-engine clustered nozzle was numerically benchmarked using a pressure-based computational fluid dynamics (CFD) method. This is a necessary step before the benchmarking of hot flow and combustion flow tests can be considered. Since the medium was unheated air, reasonable prediction of the base pressure distribution at high altitude was the main goal. Several physical phenomena pertaining to the multiengine clustered nozzle base flow physics were deduced from the analysis.

  13. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    PubMed

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  14. Wear Scar Similarities between Retrieved and Simulator-Tested Polyethylene TKR Components: An Artificial Neural Network Approach

    PubMed Central

    2016-01-01

    The aim of this study was to determine how representative wear scars of simulator-tested polyethylene (PE) inserts compare with retrieved PE inserts from total knee replacement (TKR). By means of a nonparametric self-organizing feature map (SOFM), wear scar images of 21 postmortem- and 54 revision-retrieved components were compared with six simulator-tested components that were tested either in displacement or in load control according to ISO protocols. The SOFM network was then trained with the wear scar images of postmortem-retrieved components since those are considered well-functioning at the time of retrieval. Based on this training process, eleven clusters were established, suggesting considerable variability among wear scars despite an uncomplicated loading history inside their hosts. The remaining components (revision-retrieved and simulator-tested) were then assigned to these established clusters. Six out of five simulator components were clustered together, suggesting that the network was able to identify similarities in loading history. However, the simulator-tested components ended up in a cluster at the fringe of the map containing only 10.8% of retrieved components. This may suggest that current ISO testing protocols were not fully representative of this TKR population, and protocols that better resemble patients' gait after TKR containing activities other than walking may be warranted. PMID:27597955

  15. Recognition and Matching of Clustered Mature Litchi Fruits Using Binocular Charge-Coupled Device (CCD) Color Cameras

    PubMed Central

    Wang, Chenglin; Tang, Yunchao; Zou, Xiangjun; Luo, Lufeng; Chen, Xiong

    2017-01-01

    Recognition and matching of litchi fruits are critical steps for litchi harvesting robots to successfully grasp litchi. However, due to the randomness of litchi growth, such as clustered growth with uncertain number of fruits and random occlusion by leaves, branches and other fruits, the recognition and matching of the fruit become a challenge. Therefore, this study firstly defined mature litchi fruit as three clustered categories. Then an approach for recognition and matching of clustered mature litchi fruit was developed based on litchi color images acquired by binocular charge-coupled device (CCD) color cameras. The approach mainly included three steps: (1) calibration of binocular color cameras and litchi image acquisition; (2) segmentation of litchi fruits using four kinds of supervised classifiers, and recognition of the pre-defined categories of clustered litchi fruit using a pixel threshold method; and (3) matching the recognized clustered fruit using a geometric center-based matching method. The experimental results showed that the proposed recognition method could be robust against the influences of varying illumination and occlusion conditions, and precisely recognize clustered litchi fruit. In the tested 432 clustered litchi fruits, the highest and lowest average recognition rates were 94.17% and 92.00% under sunny back-lighting and partial occlusion, and sunny front-lighting and non-occlusion conditions, respectively. From 50 pairs of tested images, the highest and lowest matching success rates were 97.37% and 91.96% under sunny back-lighting and non-occlusion, and sunny front-lighting and partial occlusion conditions, respectively. PMID:29112177

  16. A Computational Linguistic Measure of Clustering Behavior on Semantic Verbal Fluency Task Predicts Risk of Future Dementia in the Nun Study

    PubMed Central

    Pakhomov, Serguei V.S.; Hemmy, Laura S.

    2014-01-01

    Generative semantic verbal fluency (SVF) tests show early and disproportionate decline relative to other abilities in individuals developing Alzheimer’s disease. Optimal performance on SVF tests depends on the efficiency of using clustered organization of semantically related items and the ability to switch between clusters. Traditional approaches to clustering and switching have relied on manual determination of clusters. We evaluated a novel automated computational linguistic approach for quantifying clustering behavior. Our approach is based on Latent Semantic Analysis (LSA) for computing strength of semantic relatedness between pairs of words produced in response to SVF test. The mean size of semantic clusters (MCS) and semantic chains (MChS) are calculated based on pairwise relatedness values between words. We evaluated the predictive validity of these measures on a set of 239 participants in the Nun Study, a longitudinal study of aging. All were cognitively intact at baseline assessment, measured with the CERAD battery, and were followed in 18 month waves for up to 20 years. The onset of either dementia or memory impairment were used as outcomes in Cox proportional hazards models adjusted for age and education and censored at follow up waves 5 (6.3 years) and 13 (16.96 years). Higher MCS was associated with 38% reduction in dementia risk at wave 5 and 26% reduction at wave 13, but not with the onset of memory impairment. Higher (+1 SD) MChS was associated with 39% dementia risk reduction at wave 5 but not wave 13, and association with memory impairment was not significant. Higher traditional SVF scores were associated with 22–29% memory impairment and 35–40% dementia risk reduction. SVF scores were not correlated with either MCS or MChS. Our study suggests that an automated approach to measuring clustering behavior can be used to estimate dementia risk in cognitively normal individuals. PMID:23845236

  17. A computational linguistic measure of clustering behavior on semantic verbal fluency task predicts risk of future dementia in the nun study.

    PubMed

    Pakhomov, Serguei V S; Hemmy, Laura S

    2014-06-01

    Generative semantic verbal fluency (SVF) tests show early and disproportionate decline relative to other abilities in individuals developing Alzheimer's disease. Optimal performance on SVF tests depends on the efficiency of using clustered organization of semantically related items and the ability to switch between clusters. Traditional approaches to clustering and switching have relied on manual determination of clusters. We evaluated a novel automated computational linguistic approach for quantifying clustering behavior. Our approach is based on Latent Semantic Analysis (LSA) for computing strength of semantic relatedness between pairs of words produced in response to SVF test. The mean size of semantic clusters (MCS) and semantic chains (MChS) are calculated based on pairwise relatedness values between words. We evaluated the predictive validity of these measures on a set of 239 participants in the Nun Study, a longitudinal study of aging. All were cognitively intact at baseline assessment, measured with the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) battery, and were followed in 18-month waves for up to 20 years. The onset of either dementia or memory impairment were used as outcomes in Cox proportional hazards models adjusted for age and education and censored at follow-up waves 5 (6.3 years) and 13 (16.96 years). Higher MCS was associated with 38% reduction in dementia risk at wave 5 and 26% reduction at wave 13, but not with the onset of memory impairment. Higher [+1 standard deviation (SD)] MChS was associated with 39% dementia risk reduction at wave 5 but not wave 13, and association with memory impairment was not significant. Higher traditional SVF scores were associated with 22-29% memory impairment and 35-40% dementia risk reduction. SVF scores were not correlated with either MCS or MChS. Our study suggests that an automated approach to measuring clustering behavior can be used to estimate dementia risk in cognitively normal individuals. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. A time-series approach for clustering farms based on slaughterhouse health aberration data.

    PubMed

    Hulsegge, B; de Greef, K H

    2018-05-01

    A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Assembly and features of secondary metabolite biosynthetic gene clusters in Streptomyces ansochromogenes.

    PubMed

    Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

    2013-07-01

    A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.

  20. ClueNet: Clustering a temporal network based on topological similarity rather than denseness.

    PubMed

    Crawford, Joseph; Milenković, Tijana

    2018-01-01

    Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data-their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.

  1. Central tracker for BM@N experiment based on double side Si-microstrip detectors

    NASA Astrophysics Data System (ADS)

    Kovalev, Yu.; Kapishin, M.; Khabarov, S.; Shafronovskaia, A.; Tarasov, O.; Makankin, A.; Zamiatin, N.; Zubarev, E.

    2017-07-01

    Design of central tracker system based on Double-Sided Silicon Detectors (DSSD) for BM@N experiment is described. A coordinate plane with 10240 measuring channels, pitch adapter, reading electronics was developed. Each element was tested and assembled into a coordinate plane. The first tests of the plane with 106Ru source were carried out before installation for the BM@N experiment. The results of the study indicate that noisy channels and inefficient channels are less than 3%. In general, single clusters 87% (one group per module of consecutive strips) and 75% of clusters with a width equal to one strip.

  2. HPC in a HEP lab: lessons learned from setting up cost-effective HPC clusters

    NASA Astrophysics Data System (ADS)

    Husejko, Michal; Agtzidis, Ioannis; Baehler, Pierre; Dul, Tadeusz; Evans, John; Himyr, Nils; Meinhard, Helge

    2015-12-01

    In this paper we present our findings gathered during the evaluation and testing of Windows Server High-Performance Computing (Windows HPC) in view of potentially using it as a production HPC system for engineering applications. The Windows HPC package, an extension of Microsofts Windows Server product, provides all essential interfaces, utilities and management functionality for creating, operating and monitoring a Windows-based HPC cluster infrastructure. The evaluation and test phase was focused on verifying the functionalities of Windows HPC, its performance, support of commercial tools and the integration with the users work environment. We describe constraints imposed by the way the CERN Data Centre is operated, licensing for engineering tools and scalability and behaviour of the HPC engineering applications used at CERN. We will present an initial set of requirements, which were created based on the above constraints and requests from the CERN engineering user community. We will explain how we have configured Windows HPC clusters to provide job scheduling functionalities required to support the CERN engineering user community, quality of service, user- and project-based priorities, and fair access to limited resources. Finally, we will present several performance tests we carried out to verify Windows HPC performance and scalability.

  3. A dynamic scheduling algorithm for singe-arm two-cluster tools with flexible processing times

    NASA Astrophysics Data System (ADS)

    Li, Xin; Fung, Richard Y. K.

    2018-02-01

    This article presents a dynamic algorithm for job scheduling in two-cluster tools producing multi-type wafers with flexible processing times. Flexible processing times mean that the actual times for processing wafers should be within given time intervals. The objective of the work is to minimize the completion time of the newly inserted wafer. To deal with this issue, a two-cluster tool is decomposed into three reduced single-cluster tools (RCTs) in a series based on a decomposition approach proposed in this article. For each single-cluster tool, a dynamic scheduling algorithm based on temporal constraints is developed to schedule the newly inserted wafer. Three experiments have been carried out to test the dynamic scheduling algorithm proposed, comparing with the results the 'earliest starting time' heuristic (EST) adopted in previous literature. The results show that the dynamic algorithm proposed in this article is effective and practical.

  4. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

    PubMed

    Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

    2015-01-01

    In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.

  5. Random Walk Quantum Clustering Algorithm Based on Space

    NASA Astrophysics Data System (ADS)

    Xiao, Shufen; Dong, Yumin; Ma, Hongyang

    2018-01-01

    In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.

  6. Space-time analysis of Down syndrome: results consistent with transient pre-disposing contagious agent.

    PubMed

    McNally, Richard J Q; Rankin, Judith; Shirley, Mark D F; Rushton, Stephen P; Pless-Mulloli, Tanja

    2008-10-01

    Whilst maternal age is an established risk factor for Patau syndrome (trisomy 13), Edwards syndrome (trisomy 18) and Down syndrome (trisomy 21), the aetiology and contribution of genetic and environmental factors remains unclear. We analysed for space-time clustering using high quality fully population-based data from a geographically defined region. The study included all cases of Patau, Edwards and Down syndrome, delivered during 1985-2003 and resident in the former Northern Region of England, including terminations of pregnancy for fetal anomaly. We applied the K-function test for space-time clustering with fixed thresholds of close in space and time using residential addresses at time of delivery. The Knox test was used to indicate the range over which the clustering effect occurred. Tests were repeated using nearest neighbour (NN) thresholds to adjust for variable population density. The study analysed 116 cases of Patau syndrome, 240 cases of Edwards syndrome and 1084 cases of Down syndrome. There was evidence of space-time clustering for Down syndrome (fixed threshold of close in space: P = 0.01, NN threshold: P = 0.02), but little or no clustering for Patau (P = 0.57, P = 0.19) or Edwards (P = 0.37, P = 0.06) syndromes. Clustering of Down syndrome was associated with cases from more densely populated areas and evidence of clustering persisted when cases were restricted to maternal age <40 years. The highly novel space-time clustering for Down syndrome suggests an aetiological role for transient environmental factors, such as infections.

  7. Beverage consumption patterns of Canadian adults aged 19 to 65 years.

    PubMed

    Nikpartow, Nooshin; Danyliw, Adrienne D; Whiting, Susan J; Lim, Hyun J; Vatanparast, Hassanali

    2012-12-01

    To investigate the beverage intake patterns of Canadian adults and explore characteristics of participants in different beverage clusters. Analyses of nationally representative data with cross-sectional complex stratified design. Canadian Community Health Survey, Cycle 2.2 (2004). A total of 14 277 participants aged 19-65 years, in whom dietary intake was assessed using a single 24 h recall, were included in the study. After determining total intake and the contribution of beverages to total energy intake among age/sex groups, cluster analysis (K-means method) was used to classify males and females into distinct clusters based on the dominant pattern of beverage intakes. To test differences across clusters, χ2 tests and 95 % confidence intervals of the mean intakes were used. Six beverage clusters in women and seven beverage clusters in men were identified. 'Sugar-sweetened' beverage clusters - regular soft drinks and fruit drinks - as well as a 'beer' cluster, appeared for both men and women. No 'milk' cluster appeared among women. The mean consumption of the dominant beverage in each cluster was higher among men than women. The 'soft drink' cluster in men had the lowest proportion of the higher levels of education, and in women the highest proportion of inactivity, compared with other beverage clusters. Patterns of beverage intake in Canadian women indicate high consumption of sugar-sweetened beverages particularly fruit drinks, low intake of milk and high intake of beer. These patterns in women have implications for poor bone health, risk of obesity and other morbidities.

  8. Scaffold Architecture Controls Insulinoma Clustering, Viability, and Insulin Production

    PubMed Central

    Blackstone, Britani N.; Palmer, Andre F.; Rilo, Horacio R.

    2014-01-01

    Recently, in vitro diagnostic tools have shifted focus toward personalized medicine by incorporating patient cells into traditional test beds. These cell-based platforms commonly utilize two-dimensional substrates that lack the ability to support three-dimensional cell structures seen in vivo. As monolayer cell cultures have previously been shown to function differently than cells in vivo, the results of such in vitro tests may not accurately reflect cell response in vivo. It is therefore of interest to determine the relationships between substrate architecture, cell structure, and cell function in 3D cell-based platforms. To investigate the effect of substrate architecture on insulinoma organization and function, insulinomas were seeded onto 2D gelatin substrates and 3D fibrous gelatin scaffolds with three distinct fiber diameters and fiber densities. Cell viability and clustering was assessed at culture days 3, 5, and 7 with baseline insulin secretion and glucose-stimulated insulin production measured at day 7. Small, closely spaced gelatin fibers promoted the formation of large, rounded insulinoma clusters, whereas monolayer organization and large fibers prevented cell clustering and reduced glucose-stimulated insulin production. Taken together, these data show that scaffold properties can be used to control the organization and function of insulin-producing cells and may be useful as a 3D test bed for diabetes drug development. PMID:24410263

  9. Joint fMRI analysis and subject clustering using sparse dictionary learning

    NASA Astrophysics Data System (ADS)

    Kim, Seung-Jun; Dontaraju, Krishna K.

    2017-08-01

    Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.

  10. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

    PubMed Central

    Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  11. Galaxy clusters in the SDSS Stripe 82 based on photometric redshifts

    DOE PAGES

    Durret, F.; Adami, C.; Bertin, E.; ...

    2015-06-10

    Based on a recent photometric redshift galaxy catalogue, we have searched for galaxy clusters in the Stripe ~82 region of the Sloan Digital Sky Survey by applying the Adami & MAzure Cluster FInder (AMACFI). Extensive tests were made to fine-tune the AMACFI parameters and make the cluster detection as reliable as possible. The same method was applied to the Millennium simulation to estimate our detection efficiency and the approximate masses of the detected clusters. Considering all the cluster galaxies (i.e. within a 1 Mpc radius of the cluster to which they belong and with a photoz differing by less thanmore » 0.05 from that of the cluster), we stacked clusters in various redshift bins to derive colour-magnitude diagrams and galaxy luminosity functions (GLFs). For each galaxy with absolute magnitude brighter than -19.0 in the r band, we computed the disk and spheroid components by applying SExtractor, and by stacking clusters we determined how the disk-to-spheroid flux ratio varies with cluster redshift and mass. We also detected 3663 clusters in the redshift range 0.1513 and a few 10 14 solar masses. Furthermore, by stacking the cluster galaxies in various redshift bins, we find a clear red sequence in the (g'-r') versus r' colour-magnitude diagrams, and the GLFs are typical of clusters, though with a possible contamination from field galaxies. The morphological analysis of the cluster galaxies shows that the fraction of late-type to early-type galaxies shows an increase with redshift (particularly in high mass clusters) and a decrease with detection level, i.e. cluster mass. From the properties of the cluster galaxies, the majority of the candidate clusters detected here seem to be real clusters with typical cluster properties.« less

  12. Galaxy clusters in the SDSS Stripe 82 based on photometric redshifts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Durret, F.; Adami, C.; Bertin, E.

    Based on a recent photometric redshift galaxy catalogue, we have searched for galaxy clusters in the Stripe ~82 region of the Sloan Digital Sky Survey by applying the Adami & MAzure Cluster FInder (AMACFI). Extensive tests were made to fine-tune the AMACFI parameters and make the cluster detection as reliable as possible. The same method was applied to the Millennium simulation to estimate our detection efficiency and the approximate masses of the detected clusters. Considering all the cluster galaxies (i.e. within a 1 Mpc radius of the cluster to which they belong and with a photoz differing by less thanmore » 0.05 from that of the cluster), we stacked clusters in various redshift bins to derive colour-magnitude diagrams and galaxy luminosity functions (GLFs). For each galaxy with absolute magnitude brighter than -19.0 in the r band, we computed the disk and spheroid components by applying SExtractor, and by stacking clusters we determined how the disk-to-spheroid flux ratio varies with cluster redshift and mass. We also detected 3663 clusters in the redshift range 0.1513 and a few 10 14 solar masses. Furthermore, by stacking the cluster galaxies in various redshift bins, we find a clear red sequence in the (g'-r') versus r' colour-magnitude diagrams, and the GLFs are typical of clusters, though with a possible contamination from field galaxies. The morphological analysis of the cluster galaxies shows that the fraction of late-type to early-type galaxies shows an increase with redshift (particularly in high mass clusters) and a decrease with detection level, i.e. cluster mass. From the properties of the cluster galaxies, the majority of the candidate clusters detected here seem to be real clusters with typical cluster properties.« less

  13. Uptake of Home-Based HIV Testing, Linkage to Care, and Community Attitudes about ART in Rural KwaZulu-Natal, South Africa: Descriptive Results from the First Phase of the ANRS 12249 TasP Cluster-Randomised Trial

    PubMed Central

    Okesola, Nonhlanhla; Tanser, Frank; Thiebaut, Rodolphe; Rekacewicz, Claire; Newell, Marie-Louise

    2016-01-01

    Background The 2015 WHO recommendation of antiretroviral therapy (ART) for all immediately following HIV diagnosis is partially based on the anticipated impact on HIV incidence in the surrounding population. We investigated this approach in a cluster-randomised trial in a high HIV prevalence setting in rural KwaZulu-Natal. We present findings from the first phase of the trial and report on uptake of home-based HIV testing, linkage to care, uptake of ART, and community attitudes about ART. Methods and Findings Between 9 March 2012 and 22 May 2014, five clusters in the intervention arm (immediate ART offered to all HIV-positive adults) and five clusters in the control arm (ART offered according to national guidelines, i.e., CD4 count ≤ 350 cells/μl) contributed to the first phase of the trial. Households were visited every 6 mo. Following informed consent and administration of a study questionnaire, each resident adult (≥16 y) was asked for a finger-prick blood sample, which was used to estimate HIV prevalence, and offered a rapid HIV test using a serial HIV testing algorithm. All HIV-positive adults were referred to the trial clinic in their cluster. Those not linked to care 3 mo after identification were contacted by a linkage-to-care team. Study procedures were not blinded. In all, 12,894 adults were registered as eligible for participation (5,790 in intervention arm; 7,104 in control arm), of whom 9,927 (77.0%) were contacted at least once during household visits. HIV status was ever ascertained for a total of 8,233/9,927 (82.9%), including 2,569 ascertained as HIV-positive (942 tested HIV-positive and 1,627 reported a known HIV-positive status). Of the 1,177 HIV-positive individuals not previously in care and followed for at least 6 mo in the trial, 559 (47.5%) visited their cluster trial clinic within 6 mo. In the intervention arm, 89% (194/218) initiated ART within 3 mo of their first clinic visit. In the control arm, 42.3% (83/196) had a CD4 count ≤ 350 cells/μl at first visit, of whom 92.8% initiated ART within 3 mo. Regarding attitudes about ART, 93% (8,802/9,460) of participants agreed with the statement that they would want to start ART as soon as possible if HIV-positive. Estimated baseline HIV prevalence was 30.5% (2,028/6,656) (95% CI 25.0%, 37.0%). HIV prevalence, uptake of home-based HIV testing, linkage to care within 6 mo, and initiation of ART within 3 mo in those with CD4 count ≤ 350 cells/μl did not differ significantly between the intervention and control clusters. Selection bias related to noncontact could not be entirely excluded. Conclusions Home-based HIV testing was well received in this rural population, although men were less easily contactable at home; immediate ART was acceptable, with good viral suppression and retention. However, only about half of HIV-positive people accessed care within 6 mo of being identified, with nearly two-thirds accessing care by 12 mo. The observed delay in linkage to care would limit the individual and public health ART benefits of universal testing and treatment in this population. Trial registration ClinicalTrials.gov NCT01509508 PMID:27504637

  14. Uptake of Home-Based HIV Testing, Linkage to Care, and Community Attitudes about ART in Rural KwaZulu-Natal, South Africa: Descriptive Results from the First Phase of the ANRS 12249 TasP Cluster-Randomised Trial.

    PubMed

    Iwuji, Collins C; Orne-Gliemann, Joanna; Larmarange, Joseph; Okesola, Nonhlanhla; Tanser, Frank; Thiebaut, Rodolphe; Rekacewicz, Claire; Newell, Marie-Louise; Dabis, Francois

    2016-08-01

    The 2015 WHO recommendation of antiretroviral therapy (ART) for all immediately following HIV diagnosis is partially based on the anticipated impact on HIV incidence in the surrounding population. We investigated this approach in a cluster-randomised trial in a high HIV prevalence setting in rural KwaZulu-Natal. We present findings from the first phase of the trial and report on uptake of home-based HIV testing, linkage to care, uptake of ART, and community attitudes about ART. Between 9 March 2012 and 22 May 2014, five clusters in the intervention arm (immediate ART offered to all HIV-positive adults) and five clusters in the control arm (ART offered according to national guidelines, i.e., CD4 count ≤ 350 cells/μl) contributed to the first phase of the trial. Households were visited every 6 mo. Following informed consent and administration of a study questionnaire, each resident adult (≥16 y) was asked for a finger-prick blood sample, which was used to estimate HIV prevalence, and offered a rapid HIV test using a serial HIV testing algorithm. All HIV-positive adults were referred to the trial clinic in their cluster. Those not linked to care 3 mo after identification were contacted by a linkage-to-care team. Study procedures were not blinded. In all, 12,894 adults were registered as eligible for participation (5,790 in intervention arm; 7,104 in control arm), of whom 9,927 (77.0%) were contacted at least once during household visits. HIV status was ever ascertained for a total of 8,233/9,927 (82.9%), including 2,569 ascertained as HIV-positive (942 tested HIV-positive and 1,627 reported a known HIV-positive status). Of the 1,177 HIV-positive individuals not previously in care and followed for at least 6 mo in the trial, 559 (47.5%) visited their cluster trial clinic within 6 mo. In the intervention arm, 89% (194/218) initiated ART within 3 mo of their first clinic visit. In the control arm, 42.3% (83/196) had a CD4 count ≤ 350 cells/μl at first visit, of whom 92.8% initiated ART within 3 mo. Regarding attitudes about ART, 93% (8,802/9,460) of participants agreed with the statement that they would want to start ART as soon as possible if HIV-positive. Estimated baseline HIV prevalence was 30.5% (2,028/6,656) (95% CI 25.0%, 37.0%). HIV prevalence, uptake of home-based HIV testing, linkage to care within 6 mo, and initiation of ART within 3 mo in those with CD4 count ≤ 350 cells/μl did not differ significantly between the intervention and control clusters. Selection bias related to noncontact could not be entirely excluded. Home-based HIV testing was well received in this rural population, although men were less easily contactable at home; immediate ART was acceptable, with good viral suppression and retention. However, only about half of HIV-positive people accessed care within 6 mo of being identified, with nearly two-thirds accessing care by 12 mo. The observed delay in linkage to care would limit the individual and public health ART benefits of universal testing and treatment in this population. ClinicalTrials.gov NCT01509508.

  15. A Symmetric Time-Varying Cluster Rate of Descent Model

    NASA Technical Reports Server (NTRS)

    Ray, Eric S.

    2015-01-01

    A model of the time-varying rate of descent of the Orion vehicle was developed based on the observed correlation between canopy projected area and drag coefficient. This initial version of the model assumes cluster symmetry and only varies the vertical component of velocity. The cluster fly-out angle is modeled as a series of sine waves based on flight test data. The projected area of each canopy is synchronized with the primary fly-out angle mode. The sudden loss of projected area during canopy collisions is modeled at minimum fly-out angles, leading to brief increases in rate of descent. The cluster geometry is converted to drag coefficient using empirically derived constants. A more complete model is under development, which computes the aerodynamic response of each canopy to its local incidence angle.

  16. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    PubMed

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    PubMed

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B

    2016-01-01

    Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.

  18. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  19. Automatic detection of multiple UXO-like targets using magnetic anomaly inversion and self-adaptive fuzzy c-means clustering

    NASA Astrophysics Data System (ADS)

    Yin, Gang; Zhang, Yingtang; Fan, Hongbo; Ren, Guoquan; Li, Zhining

    2017-12-01

    We have developed a method for automatically detecting UXO-like targets based on magnetic anomaly inversion and self-adaptive fuzzy c-means clustering. Magnetic anomaly inversion methods are used to estimate the initial locations of multiple UXO-like sources. Although these initial locations have some errors with respect to the real positions, they form dense clouds around the actual positions of the magnetic sources. Then we use the self-adaptive fuzzy c-means clustering algorithm to cluster these initial locations. The estimated number of cluster centroids represents the number of targets and the cluster centroids are regarded as the locations of magnetic targets. Effectiveness of the method has been demonstrated using synthetic datasets. Computational results show that the proposed method can be applied to the case of several UXO-like targets that are randomly scattered within in a confined, shallow subsurface, volume. A field test was carried out to test the validity of the proposed method and the experimental results show that the prearranged magnets can be detected unambiguously and located precisely.

  20. Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

    PubMed Central

    Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy

    2011-01-01

    Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291

  1. Cluster-randomised non-inferiority trial comparing DVD-assisted and traditional genetic counselling in systematic population testing for BRCA1/2 mutations.

    PubMed

    Manchanda, Ranjit; Burnell, Matthew; Loggenberg, Kelly; Desai, Rakshit; Wardle, Jane; Sanderson, Saskia C; Gessler, Sue; Side, Lucy; Balogun, Nyala; Kumar, Ajith; Dorkins, Huw; Wallis, Yvonne; Chapman, Cyril; Tomlinson, Ian; Taylor, Rohan; Jacobs, Chris; Legood, Rosa; Raikou, Maria; McGuire, Alistair; Beller, Uziel; Menon, Usha; Jacobs, Ian

    2016-07-01

    Newer approaches to genetic counselling are required for population-based testing. We compare traditional face-to-face genetic counselling with a DVD-assisted approach for population-based BRCA1/2 testing. A cluster-randomised non-inferiority trial in the London Ashkenazi Jewish population. Ashkenazi Jewish men/women >18 years; exclusion criteria: (a) known BRCA1/2 mutation, (b) previous BRCA1/2 testing and (c) first-degree relative of BRCA1/2 carrier. Ashkenazi Jewish men/women underwent pre-test genetic counselling prior to BRCA1/2 testing in the Genetic Cancer Prediction through Population Screening trial (ISRCTN73338115). Genetic counselling clinics (clusters) were randomised to traditional counselling (TC) and DVD-based counselling (DVD-C) approaches. DVD-C involved a DVD presentation followed by shorter face-to-face genetic counselling. Outcome measures included genetic testing uptake, cancer risk perception, increase in knowledge, counselling time and satisfaction (Genetic Counselling Satisfaction Scale). Random-effects models adjusted for covariates compared outcomes between TC and DVD-C groups. One-sided 97.5% CI was used to determine non-inferiority. relevance, satisfaction, adequacy, emotional impact and improved understanding with the DVD; cost-minimisation analysis for TC and DVD-C approaches. 936 individuals (clusters=256, mean-size=3.6) were randomised to TC (n=527, clusters=134) and DVD-C (n=409, clusters=122) approaches. Groups were similar at baseline, mean age=53.9 (SD=15) years, women=66.8%, men=33.2%. DVD-C was non-inferior to TC for increase in knowledge (d=-0.07; lower 97.5% CI=-0.41), counselling satisfaction (d=-0.38, 97.5% CI=1.2) and risk perception (d=0.08; upper 97.5% CI=3.1). Group differences and CIs did not cross non-inferiority margins. DVD-C was equivalent to TC for uptake of genetic testing (d=-3%; lower/upper 97.5% CI -7.9%/1.7%) and superior for counselling time (20.4 (CI 18.7 to 22.2) min reduction (p<0.005)). 98% people found the DVD length and information satisfactory. 85-89% felt it improved their understanding of risks/benefits/implications/purpose of genetic testing. 95% would recommend it to others. The cost of genetic counselling for DVD-C=£7787 and TC=£17 307. DVD-C resulted in cost savings=£9520 (£14/volunteer). DVD-C is an effective, acceptable, non-inferior, time-saving and cost-efficient alternative to TC. ISRCTN 73338115. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  2. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  3. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  4. An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE).

    PubMed

    Baker, David M; Valleron, Alain-Jacques

    2014-10-30

    Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose. Geographical grids based on the Lambert Azimuthal Equal Area projection are well suited for spatial epidemiology because they preserve area: each cell of the grid has the same area. We describe how data is projected onto such a grid, as well as grid-based algorithms for spatial epidemiological data-mining. The software program (FGBASE), that we have developed, implements these grid-based methods. The grid based algorithms perform extremely fast. This is particularly the case for cluster searches. When applied to a cohort of French Type 1 Diabetes (T1D) patients, as an example, the grid based algorithms detected potential clusters in a few seconds on a modern laptop. This compares very favorably to an equivalent cluster search using distance calculations instead of a grid, which took over 4 hours on the same computer. In the case study we discovered 4 potential clusters of T1D cases near the cities of Le Havre, Dunkerque, Toulouse and Nantes. One example of environmental analysis with our software was to study whether a significant association could be found between distance to vineyards with heavy pesticide. None was found. In both examples, the software facilitates the rapid testing of hypotheses. Grid-based algorithms for mining spatial epidemiological data provide advantages in terms of computational complexity thus improving the speed of computations. We believe that these methods and this software tool (FGBASE) will lower the computational barriers to entry for those performing epidemiological research.

  5. Biomarker clusters are differentially associated with longitudinal cognitive decline in late midlife

    PubMed Central

    Racine, Annie M.; Koscik, Rebecca L.; Berman, Sara E.; Nicholas, Christopher R.; Clark, Lindsay R.; Okonkwo, Ozioma C.; Rowley, Howard A.; Asthana, Sanjay; Bendlin, Barbara B.; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E.; Carlsson, Cynthia M.

    2016-01-01

    The ability to detect preclinical Alzheimer’s disease is of great importance, as this stage of the Alzheimer’s continuum is believed to provide a key window for intervention and prevention. As Alzheimer’s disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer’s disease; (ii) mixed Alzheimer’s disease and vascular aetiology; (iii) suspected non-Alzheimer’s disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. PMID:27324877

  6. Biomarker clusters are differentially associated with longitudinal cognitive decline in late midlife.

    PubMed

    Racine, Annie M; Koscik, Rebecca L; Berman, Sara E; Nicholas, Christopher R; Clark, Lindsay R; Okonkwo, Ozioma C; Rowley, Howard A; Asthana, Sanjay; Bendlin, Barbara B; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E; Carlsson, Cynthia M; Johnson, Sterling C

    2016-08-01

    The ability to detect preclinical Alzheimer's disease is of great importance, as this stage of the Alzheimer's continuum is believed to provide a key window for intervention and prevention. As Alzheimer's disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer's disease; (ii) mixed Alzheimer's disease and vascular aetiology; (iii) suspected non-Alzheimer's disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. © The Author (2016). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

    NASA Astrophysics Data System (ADS)

    Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

    2016-10-01

    Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.

  8. Investigation of correlation classification techniques

    NASA Technical Reports Server (NTRS)

    Haskell, R. E.

    1975-01-01

    A two-step classification algorithm for processing multispectral scanner data was developed and tested. The first step is a single pass clustering algorithm that assigns each pixel, based on its spectral signature, to a particular cluster. The output of that step is a cluster tape in which a single integer is associated with each pixel. The cluster tape is used as the input to the second step, where ground truth information is used to classify each cluster using an iterative method of potentials. Once the clusters have been assigned to classes the cluster tape is read pixel-by-pixel and an output tape is produced in which each pixel is assigned to its proper class. In addition to the digital classification programs, a method of using correlation clustering to process multispectral scanner data in real time by means of an interactive color video display is also described.

  9. The implementation of two stages clustering (k-means clustering and adaptive neuro fuzzy inference system) for prediction of medicine need based on medical data

    NASA Astrophysics Data System (ADS)

    Husein, A. M.; Harahap, M.; Aisyah, S.; Purba, W.; Muhazir, A.

    2018-03-01

    Medication planning aim to get types, amount of medicine according to needs, and avoid the emptiness medicine based on patterns of disease. In making the medicine planning is still rely on ability and leadership experience, this is due to take a long time, skill, difficult to obtain a definite disease data, need a good record keeping and reporting, and the dependence of the budget resulted in planning is not going well, and lead to frequent lack and excess of medicines. In this research, we propose Adaptive Neuro Fuzzy Inference System (ANFIS) method to predict medication needs in 2016 and 2017 based on medical data in 2015 and 2016 from two source of hospital. The framework of analysis using two approaches. The first phase is implementing ANFIS to a data source, while the second approach we keep using ANFIS, but after the process of clustering from K-Means algorithm, both approaches are calculated values of Root Mean Square Error (RMSE) for training and testing. From the testing result, the proposed method with better prediction rates based on the evaluation analysis of quantitative and qualitative compared with existing systems, however the implementation of K-Means Algorithm against ANFIS have an effect on the timing of the training process and provide a classification accuracy significantly better without clustering.

  10. Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm.

    PubMed

    Gibbons, Theodore R; Mount, Stephen M; Cooper, Endymion D; Delwiche, Charles F

    2015-07-10

    Clustering protein sequences according to inferred homology is a fundamental step in the analysis of many large data sets. Since the publication of the Markov Clustering (MCL) algorithm in 2002, it has been the centerpiece of several popular applications. Each of these approaches generates an undirected graph that represents sequences as nodes connected to each other by edges weighted with a BLAST-based metric. MCL is then used to infer clusters of homologous proteins by analyzing these graphs. The various approaches differ only by how they weight the edges, yet there has been very little direct examination of the relative performance of alternative edge-weighting metrics. This study compares the performance of four BLAST-based edge-weighting metrics: the bit score, bit score ratio (BSR), bit score over anchored length (BAL), and negative common log of the expectation value (NLE). Performance is tested using the Extended CEGMA KOGs (ECK) database, which we introduce here. All metrics performed similarly when analyzing full-length sequences, but dramatic differences emerged as progressively larger fractions of the test sequences were split into fragments. The BSR and BAL successfully rescued subsets of clusters by strengthening certain types of alignments between fragmented sequences, but also shifted the largest correct scores down near the range of scores generated from spurious alignments. This penalty outweighed the benefits in most test cases, and was greatly exacerbated by increasing the MCL inflation parameter, making these metrics less robust than the bit score or the more popular NLE. Notably, the bit score performed as well or better than the other three metrics in all scenarios. The results provide a strong case for use of the bit score, which appears to offer equivalent or superior performance to the more popular NLE. The insight that MCL-based clustering methods can be improved using a more tractable edge-weighting metric will greatly simplify future implementations. We demonstrate this with our own minimalist Python implementation: Porthos, which uses only standard libraries and can process a graph with 25 m + edges connecting the 60 k + KOG sequences in half a minute using less than half a gigabyte of memory.

  11. Scoring clustering solutions by their biological relevance.

    PubMed

    Gat-Viks, I; Sharan, R; Shamir, R

    2003-12-12

    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.

  12. Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

    PubMed

    Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

    2017-03-01

    Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. Consensus-Based Sorting of Neuronal Spike Waveforms

    PubMed Central

    Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990

  14. Consensus-Based Sorting of Neuronal Spike Waveforms.

    PubMed

    Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.

  15. The Flies and Eyes project: design and methods of a cluster-randomised intervention study to confirm the importance of flies as trachoma vectors in The Gambia and to test a sustainable method of fly control using pit latrines.

    PubMed

    Emerson, Paul M; Lindsay, Steve W; Walraven, Gijs E L; Dibba, Sheikh Mafuji; Lowe, Kebba O; Bailey, Robin L

    2002-04-01

    The Flies and Eyes project is a community-based, cluster-randomised, intervention trial based in a rural area of The Gambia. It was designed to prove whether flies are mechanical vectors of trachoma; to quantify the relative importance of flies as vectors of trachoma and to test the effectiveness of insecticide spraying and the provision of latrines in trachoma control. A total of 21 clusters, each composed of 300-550 people, are to be recruited in groups of three. One cluster from each group is randomly allocated to receive insecticide spraying, one to receive pit latrines and the remaining to act as a control. The seven groups of clusters are recruited on a step-wise basis separated by two months to aid logistics and allow all seasons to be covered. Standardised, validated trachoma surveys are conducted for people of all ages and both sexes at baseline and six months post intervention. The Muscid fly population is monitored using standard traps and fly-eye contact is measured with catches of flies direct from children's faces. The Flies and Eyes project has been designed to strengthen the evidence base for the 'E' component of the SAFE strategy for trachoma control. The results will assist programme planners and country co-ordinators to make informed decisions on the environmental aspects of trachoma control.

  16. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    PubMed

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. TESTING STELLAR POPULATION SYNTHESIS MODELS WITH SLOAN DIGITAL SKY SURVEY COLORS OF M31's GLOBULAR CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peacock, Mark B.; Zepf, Stephen E.; Maccarone, Thomas J.

    2011-08-10

    Accurate stellar population synthesis models are vital in understanding the properties and formation histories of galaxies. In order to calibrate and test the reliability of these models, they are often compared with observations of star clusters. However, relatively little work has compared these models in the ugriz filters, despite the recent widespread use of this filter set. In this paper, we compare the integrated colors of globular clusters in the Sloan Digital Sky Survey (SDSS) with those predicted from commonly used simple stellar population (SSP) models. The colors are based on SDSS observations of M31's clusters and provide the largestmore » population of star clusters with accurate photometry available from the survey. As such, it is a unique sample with which to compare SSP models with SDSS observations. From this work, we identify a significant offset between the SSP models and the clusters' g - r colors, with the models predicting colors which are too red by g - r {approx} 0.1. This finding is consistent with previous observations of luminous red galaxies in the SDSS, which show a similar discrepancy. The identification of this offset in globular clusters suggests that it is very unlikely to be due to a minority population of young stars. The recently updated SSP model of Maraston and Stroembaeck better represents the observed g - r colors. This model is based on the empirical MILES stellar library, rather than theoretical libraries, suggesting an explanation for the g - r discrepancy.« less

  18. Spatial cluster detection using dynamic programming.

    PubMed

    Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

    2012-03-25

    The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.

  19. Spatial cluster detection using dynamic programming

    PubMed Central

    2012-01-01

    Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

  20. Inferring HIV-1 Transmission Dynamics in Germany From Recently Transmitted Viruses.

    PubMed

    Pouran Yousef, Kaveh; Meixenberger, Karolin; Smith, Maureen R; Somogyi, Sybille; Gromöller, Silvana; Schmidt, Daniel; Gunsenheimer-Bartmeyer, Barbara; Hamouda, Osamah; Kücherer, Claudia; von Kleist, Max

    2016-11-01

    Although HIV continues to spread globally, novel intervention strategies such as treatment as prevention (TasP) may bring the epidemic to a halt. However, their effective implementation requires a profound understanding of the underlying transmission dynamics. We analyzed parameters of the German HIV epidemic based on phylogenetic clustering of viral sequences from recently infected seroconverters with known infection dates. Viral baseline and follow-up pol sequences (n = 1943) from 1159 drug-naïve individuals were selected from a nationwide long-term observational study initiated in 1997. Putative transmission clusters were computed based on a maximum likelihood phylogeny. Using individual follow-up sequences, we optimized our clustering threshold to maximize the likelihood of co-clustering individuals connected by direct transmission. The sizes of putative transmission clusters scaled inversely with their abundance and their distribution exhibited a heavy tail. Clusters based on the optimal clustering threshold were significantly more likely to contain members of the same or bordering German federal states. Interinfection times between co-clustered individuals were significantly shorter (26 weeks; interquartile range: 13-83) than in a null model. Viral intraindividual evolution may be used to select criteria that maximize co-clustering of transmission pairs in the absence of strong adaptive selection pressure. Interinfection times of co-clustered individuals may then be an indicator of the typical time to onward transmission. Our analysis suggests that onward transmission may have occurred early after infection, when individuals are typically unaware of their serological status. The latter argues that TasP should be combined with HIV testing campaigns to reduce the possibility of transmission before TasP initiation.

  1. Distant star clusters of the Milky Way in MOND

    NASA Astrophysics Data System (ADS)

    Haghi, H.; Baumgardt, H.; Kroupa, P.

    2011-03-01

    We determine the mean velocity dispersion of six Galactic outer halo globular clusters, AM 1, Eridanus, Pal 3, Pal 4, Pal 15, and Arp 2 in the weak acceleration regime to test classical vs. modified Newtonian dynamics (MOND). Owing to the nonlinearity of MOND's Poisson equation, beyond tidal effects, the internal dynamics of clusters is affected by the external field in which they are immersed. For the studied clusters, particle accelerations are much lower than the critical acceleration a0 of MOND, but the motion of stars is neither dominated by internal accelerations (ai ≫ ae) nor external accelerations (ae ≫ ai). We use the N-body code N-MODY in our analysis, which is a particle-mesh-based code with a numerical MOND potential solver developed by Ciotti et al. (2006, ApJ, 640, 741) to derive the line-of-sight velocity dispersion by adding the external field effect. We show that Newtonian dynamics predicts a low-velocity dispersion for each cluster, while in modified Newtonian dynamics the velocity dispersion is much higher. We calculate the minimum number of measured stars necessary to distinguish between Newtonian gravity and MOND with the Kolmogorov-Smirnov test. We also show that for most clusters it is necessary to measure the velocities of between 30 to 80 stars to distinguish between both cases. Therefore the observational measurement of the line-of-sight velocity dispersion of these clusters will provide a test for MOND.

  2. Do pig farmers preferences bias consumer choice for pork? Response to critique of the pork preference studies.

    PubMed

    Ngapo, T M; Fortin, J; Martin, J-F

    2010-08-01

    Québec consumers and pig farmers selected their preferred chop from 16 images that had been modified to give 16 treatments: two levels each of fat cover, colour, marbling and drip. The selection process was repeated eight times from different groups of chops. Fat cover (47% preferred lean) and colour (44%, light red) were the most frequently chosen characteristics. No significant differences were observed between farmers and consumers preferences (chi(2) test, P<0.05). Two preference-based clusters were found; 41% preferring dark red, lean meat and 59%, light red, lean meat, without marbling or drip. Choice-based clusters showed no significant links with either individual socio-demographic items, including pig farmer as occupation, or the three socio-demographic-based clusters observed (chi(2) test, P<0.05). No evidence was found to suggest that the choices of pig farmers differed from those of consumers and, therefore, inclusion of pig farmers in consumer panels would not bias consumer choice for pork. Crown Copyright (c) 2010. Published by Elsevier Ltd. All rights reserved.

  3. Clustering of change patterns using Fourier coefficients.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2008-01-15

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.

  4. A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data

    ERIC Educational Resources Information Center

    Jak, Suzanne; Oort, Frans J.; Dolan, Conor V.

    2013-01-01

    We present a test for cluster bias, which can be used to detect violations of measurement invariance across clusters in 2-level data. We show how measurement invariance assumptions across clusters imply measurement invariance across levels in a 2-level factor model. Cluster bias is investigated by testing whether the within-level factor loadings…

  5. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  6. Algorithms of maximum likelihood data clustering with applications

    NASA Astrophysics Data System (ADS)

    Giada, Lorenzo; Marsili, Matteo

    2002-12-01

    We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.

  7. Clustering PPI data by combining FA and SHC method.

    PubMed

    Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin

    2015-01-01

    Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.

  8. Clustering PPI data by combining FA and SHC method

    PubMed Central

    2015-01-01

    Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632

  9. Implementation of K-Means Clustering Method for Electronic Learning Model

    NASA Astrophysics Data System (ADS)

    Latipa Sari, Herlina; Suranti Mrs., Dewi; Natalia Zulita, Leni

    2017-12-01

    Teaching and Learning process at SMK Negeri 2 Bengkulu Tengah has applied e-learning system for teachers and students. The e-learning was based on the classification of normative, productive, and adaptive subjects. SMK Negeri 2 Bengkulu Tengah consisted of 394 students and 60 teachers with 16 subjects. The record of e-learning database was used in this research to observe students’ activity pattern in attending class. K-Means algorithm in this research was used to classify students’ learning activities using e-learning, so that it was obtained cluster of students’ activity and improvement of student’s ability. Implementation of K-Means Clustering method for electronic learning model at SMK Negeri 2 Bengkulu Tengah was conducted by observing 10 students’ activities, namely participation of students in the classroom, submit assignment, view assignment, add discussion, view discussion, add comment, download course materials, view article, view test, and submit test. In the e-learning model, the testing was conducted toward 10 students that yielded 2 clusters of membership data (C1 and C2). Cluster 1: with membership percentage of 70% and it consisted of 6 members, namely 1112438 Anggi Julian, 1112439 Anis Maulita, 1112441 Ardi Febriansyah, 1112452 Berlian Sinurat, 1112460 Dewi Anugrah Anwar and 1112467 Eka Tri Oktavia Sari. Cluster 2:with membership percentage of 30% and it consisted of 4 members, namely 1112463 Dosita Afriyani, 1112471 Erda Novita, 1112474 Eskardi and 1112477 Fachrur Rozi.

  10. Object-Oriented Image Clustering Method Using UAS Photogrammetric Imagery

    NASA Astrophysics Data System (ADS)

    Lin, Y.; Larson, A.; Schultz-Fellenz, E. S.; Sussman, A. J.; Swanson, E.; Coppersmith, R.

    2016-12-01

    Unmanned Aerial Systems (UAS) have been used widely as an imaging modality to obtain remotely sensed multi-band surface imagery, and are growing in popularity due to their efficiency, ease of use, and affordability. Los Alamos National Laboratory (LANL) has employed the use of UAS for geologic site characterization and change detection studies at a variety of field sites. The deployed UAS equipped with a standard visible band camera to collect imagery datasets. Based on the imagery collected, we use deep sparse algorithmic processing to detect and discriminate subtle topographic features created or impacted by subsurface activities. In this work, we develop an object-oriented remote sensing imagery clustering method for land cover classification. To improve the clustering and segmentation accuracy, instead of using conventional pixel-based clustering methods, we integrate the spatial information from neighboring regions to create super-pixels to avoid salt-and-pepper noise and subsequent over-segmentation. To further improve robustness of our clustering method, we also incorporate a custom digital elevation model (DEM) dataset generated using a structure-from-motion (SfM) algorithm together with the red, green, and blue (RGB) band data for clustering. In particular, we first employ an agglomerative clustering to create an initial segmentation map, from where every object is treated as a single (new) pixel. Based on the new pixels obtained, we generate new features to implement another level of clustering. We employ our clustering method to the RGB+DEM datasets collected at the field site. Through binary clustering and multi-object clustering tests, we verify that our method can accurately separate vegetation from non-vegetation regions, and are also able to differentiate object features on the surface.

  11. A 10-year population based study of 'opt-out' HIV testing of tuberculosis patients in Alberta, Canada: national implications.

    PubMed

    Long, Richard; Niruban, Selvanayagam; Heffernan, Courtney; Cooper, Ryan; Fisher, Dina; Ahmed, Rabia; Egedahl, Mary Lou; Fur, Rhonda

    2014-01-01

    Compliance with the recommendation that all tuberculosis (TB) patients be tested for human immunodeficiency virus (HIV) has not yet been achieved in Canada or globally. The experience of "opt-out" HIV testing of TB patients in the Province of Alberta, Canada is described over a 10-year period, 2003-2012. Testing rates are reported before and after the introduction of the "opt-out" approach. Risk factors for HIV seropositivity are described and demographic, clinical and laboratory characteristics of TB patients who were newly diagnosed versus previously diagnosed with HIV are compared. Genotypic clusters, defined as groups of two or more cases whose isolates of Mycobacterium tuberculosis had identical DNA fingerprints over the 10-year period or within 2 years of one another, were analyzed for their ability to predict HIV co-infection. HIV testing rates were 26% before and 90% after the introduction of "opt-out" testing. During the "opt-out" testing years those <15 or >64 years of age at diagnosis were less likely to have been tested. In those tested the prevalence of HIV was 5.6%. In the age group 15-64 years, risk factors for HIV were: age (35-64 years), Canadian-born Aboriginal or foreign-born sub-Saharan African origin, and combined respiratory and non-respiratory disease. Compared to TB patients previously known to be HIV positive, TB patients newly discovered to be HIV positive had more advanced HIV disease (lower CD4 counts; higher viral loads) at diagnosis. Large cluster size was associated with Aboriginal ancestry. Cluster size predicted HIV co-infection in Aboriginal peoples when clusters included all cases reported over 10 years but not when clusters included cases reported within 2 years of one another. "Opt-out" HIV testing of TB patients is effective and well received. Universal HIV testing of TB patients (>80% of patients tested) has immediate (patients) and longer-term (TB/HIV program planning) benefits.

  12. Probing the dynamical and X-ray mass proxies of the cluster of galaxies Abell S1101

    NASA Astrophysics Data System (ADS)

    Rabitz, Andreas; Zhang, Yu-Ying; Schwope, Axel; Verdugo, Miguel; Reiprich, Thomas H.; Klein, Matthias

    2017-01-01

    Context. The galaxy cluster Abell S1101 (S1101 hereafter) deviates significantly from the X-ray luminosity versus velocity dispersion relation (L-σ) of galaxy clusters in our previous study. Given reliable X-ray luminosity measurement combining XMM-Newton and ROSAT, this could most likely be caused by the bias in the velocity dispersion due to interlopers and low member statistic in the previous sample of member galaxies, which was solely based on 20 galaxy redshifts drawn from the literature. Aims: We intend to increase the galaxy member statistics to perform precision measurements of the velocity dispersion and dynamical mass of S1101. We aim for a detailed substructure and dynamical state characterization of this cluster, and a comparison of mass estimates derived from (I) the velocity dispersion (Mvir), (II) the caustic mass computation (Mcaustic), and (III) mass proxies from X-ray observations and the Sunyaev-Zel'dovich (SZ) effect. Methods: We carried out new optical spectroscopic observations of the galaxies in this cluster field with VIMOS, obtaining a sample of 60 member galaxies for S1101. We revised the cluster redshift and velocity dispersion measurements based on this sample and also applied the Dressler-Shectman substructure test. Results: The completeness of cluster members within r200 was significantly improved for this cluster. Tests for dynamical substructure do not show evidence of major disturbances or merging activities in S1101. We find good agreement between the dynamical cluster mass measurements and X-ray mass estimates, which confirms the relaxed state of the cluster displayed in the 2D substructure test. The SZ mass proxy is slightly higher than the other estimates. The updated measurement of σ erased the deviation of S1101 in the L-σ relation. We also noticed a background structure in the cluster field of S1101. This structure is a galaxy group that is very close to the cluster S1101 in projection but at almost twice its redshift. However the mass of this structure is too low to significantly bias the observed bolometric X-ray luminosity of S1101. Hence, we can conclude that the deviation of S1101 in the L-σ relation in our previous study can be explained by low member statistics and galaxy interlopers, which are known to introduce biases in the estimated velocity dispersion. We have made use of VLT/VIMOS observations taken with the ESO Telescope at the Paranal Observatory under programme 087.A-0096.

  13. Finding and testing network communities by lumped Markov chains.

    PubMed

    Piccardi, Carlo

    2011-01-01

    Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition that is based on a quality threshold. By means of a lumped Markov chain model of a random walker, a quality measure called "persistence probability" is associated to a cluster, which is then defined as an "α-community" if such a probability is not smaller than α. Consistently, a partition composed of α-communities is an "α-partition." These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired α-level allows one to immediately select the α-partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the quality of each single community. Given its ability in individually assessing each single cluster, this approach can also disclose single well-defined communities even in networks that overall do not possess a definite clusterized structure.

  14. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    NASA Astrophysics Data System (ADS)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  15. A clustering algorithm for determining community structure in complex networks

    NASA Astrophysics Data System (ADS)

    Jin, Hong; Yu, Wei; Li, ShiJun

    2018-02-01

    Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.

  16. Automated clustering of probe molecules from solvent mapping of protein surfaces: new algorithms applied to hot-spot mapping and structure-based drug design

    NASA Astrophysics Data System (ADS)

    Lerner, Michael G.; Meagher, Kristin L.; Carlson, Heather A.

    2008-10-01

    Use of solvent mapping, based on multiple-copy minimization (MCM) techniques, is common in structure-based drug discovery. The minima of small-molecule probes define locations for complementary interactions within a binding pocket. Here, we present improved methods for MCM. In particular, a Jarvis-Patrick (JP) method is outlined for grouping the final locations of minimized probes into physical clusters. This algorithm has been tested through a study of protein-protein interfaces, showing the process to be robust, deterministic, and fast in the mapping of protein "hot spots." Improvements in the initial placement of probe molecules are also described. A final application to HIV-1 protease shows how our automated technique can be used to partition data too complicated to analyze by hand. These new automated methods may be easily and quickly extended to other protein systems, and our clustering methodology may be readily incorporated into other clustering packages.

  17. Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies

    DOE PAGES

    Campana, R.; Bernieri, E.; Massaro, E.; ...

    2013-05-22

    We present that the minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to γ-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular “tree” connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, canmore » be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitudeM of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that √M is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. Finally, we apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.« less

  18. Mechanism for Collective Cell Alignment in Myxococcus xanthus Bacteria

    PubMed Central

    Balagam, Rajesh; Igoshin, Oleg A.

    2015-01-01

    Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type and non-reversing M. xanthus mutants in recent experiments. Our results are robust to variation in model parameters, match the experimentally observed trends and can be applied to understand surface motility patterns of other bacterial species. PMID:26308508

  19. Ortholog-based screening and identification of genes related to intracellular survival.

    PubMed

    Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin

    2018-04-20

    Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.

  20. Android Malware Classification Using K-Means Clustering Algorithm

    NASA Astrophysics Data System (ADS)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  1. ClueNet: Clustering a temporal network based on topological similarity rather than denseness

    PubMed Central

    Milenković, Tijana

    2018-01-01

    Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of “topologically related” nodes, where the resulting topology-based clusters are expected to “correlate” well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data—their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance. PMID:29738568

  2. Performance analysis of clustering techniques over microarray data: A case study

    NASA Astrophysics Data System (ADS)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  3. The employment of Support Vector Machine to classify high and low performance archers based on bio-physiological variables

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Amirul Abdullah, Muhammad; Hasnun Arif Hassan, Mohd; Khalil, Zubair

    2018-04-01

    The present study employs a machine learning algorithm namely support vector machine (SVM) to classify high and low potential archers from a collection of bio-physiological variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. The bio-physiological variables namely resting heart rate, resting respiratory rate, resting diastolic blood pressure, resting systolic blood pressure, as well as calories intake, were measured prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models i.e. linear, quadratic and cubic kernel functions, were trained on the aforementioned variables. The k-means clustered the archers into high (HPA) and low potential archers (LPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy with a classification accuracy of 94% in comparison the other tested models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected bio-physiological variables examined.

  4. Automated rice leaf disease detection using color image analysis

    NASA Astrophysics Data System (ADS)

    Pugoy, Reinald Adrian D. L.; Mariano, Vladimir Y.

    2011-06-01

    In rice-related institutions such as the International Rice Research Institute, assessing the health condition of a rice plant through its leaves, which is usually done as a manual eyeball exercise, is important to come up with good nutrient and disease management strategies. In this paper, an automated system that can detect diseases present in a rice leaf using color image analysis is presented. In the system, the outlier region is first obtained from a rice leaf image to be tested using histogram intersection between the test and healthy rice leaf images. Upon obtaining the outlier, it is then subjected to a threshold-based K-means clustering algorithm to group related regions into clusters. Then, these clusters are subjected to further analysis to finally determine the suspected diseases of the rice leaf.

  5. From Innovation to Impact at Scale: Lessons Learned from a Cluster of Research-Community Partnerships

    PubMed Central

    Schindler, Holly S.; Fisher, Philip A.; Shonkoff, Jack P.

    2017-01-01

    This paper presents a description of how an interdisciplinary network of academic researchers, community-based programs, parents, and state agencies have joined together to design, test, and scale a suite of innovative intervention strategies rooted in new knowledge about the biology of adversity. Through a process of co-creation, collective pilot-testing, and the support of a measurement and evaluation hub, the Washington State Innovation Cluster is using rapid cycle, iterative learning to elucidate differential impacts of interventions designed to build child and caregiver capacities and address the developmental consequences of socioeconomic disadvantage. Key characteristics of the Innovation Cluster model are described and an example is presented of a video-coaching intervention that has been implemented, adapted, and evaluated through this distinctive, collaborative process. PMID:28777436

  6. Distinct phenotype clusters in childhood inflammatory brain diseases: implications for diagnostic evaluation.

    PubMed

    Cellucci, Tania; Tyrrell, Pascal N; Twilt, Marinka; Sheikh, Shehla; Benseler, Susanne M

    2014-03-01

    To identify distinct clusters of children with inflammatory brain diseases based on clinical, laboratory, and imaging features at presentation, to assess which features contribute strongly to the development of clusters, and to compare additional features between the identified clusters. A single-center cohort study was performed with children who had been diagnosed as having an inflammatory brain disease between June 1, 1989 and December 31, 2010. Demographic, clinical, laboratory, neuroimaging, and histologic data at diagnosis were collected. K-means cluster analysis was performed to identify clusters of patients based on their presenting features. Associations between the clusters and patient variables, such as diagnoses, were determined. A total of 147 children (50% female; median age 8.8 years) were identified: 105 with primary central nervous system (CNS) vasculitis, 11 with secondary CNS vasculitis, 8 with neuronal antibody syndromes, 6 with postinfectious syndromes, and 17 with other inflammatory brain diseases. Three distinct clusters were identified. Paresis and speech deficits were the most common presenting features in cluster 1. Children in cluster 2 were likely to present with behavior changes, cognitive dysfunction, and seizures, while those in cluster 3 experienced ataxia, vision abnormalities, and seizures. Lesions seen on T2/fluid-attenuated inversion recovery sequences of magnetic resonance imaging were common in all clusters, but unilateral ischemic lesions were more prominent in cluster 1. The clusters were associated with specific diagnoses and diagnostic test results. Children with inflammatory brain diseases presented with distinct phenotypical patterns that are associated with specific diagnoses. This information may inform the development of a diagnostic classification of childhood inflammatory brain diseases and suggest that specific pathways of diagnostic evaluation are warranted. Copyright © 2014 by the American College of Rheumatology.

  7. Biochemical characterization and phylogenetic analysis based on 16S rRNA sequences for V-factor dependent members of Pasteurellaceae derived from laboratory rats.

    PubMed

    Hayashimoto, Nobuhito; Ueno, Masami; Tkakura, Akira; Itoh, Toshio

    2007-06-01

    Phylogenetic analysis based on 16S rRNA sequences with sequence data of some bacterial species of Pasteurellaceae related to rodents deposited in GenBank was performed along with biochemical characterization for the 20 strains of V-factor dependent members of Pasteurellaceae derived from laboratory rats to obtain basic information and to investigate the taxonomic positions. The results of biochemical tests for all strains were identical except for three tests, the ornithine decarboxylase test, and fermentation tests of D(+) mannose and D(+) xylose. The biochemical properties of 8 of 20 strains that showed negative results for the fermentation test of D(+) xylose agreed with those of Haemophilus parainfluenzae complex. By phylogenetic analysis, the strains were divided into two clusters that agreed with the results of the fermentation test of xylose (group I: negative reaction for xylose, group II: positive reaction for xylose). The clusters were independent of other bacterial species of Pasteurellaceae tested. The sequences of the strains in group I showed 99.7-99.8% similarity and the strains in group II showed 99.3-99.7% similarity. None of the strains in group I had a close relation with Haemophilus parainfluenzae by phylogenetic analysis, although they showed the same biochemical properties. In conclusion, the strains had characteristic biochemical properties and formed two independent groups within the "rodent cluster" of Pasteurellaceae that differed in the results of the fermentation test of xylose. Therefore, they seemed to be hitherto undescribed taxa in Pasteurellaceae.

  8. Reducing Tobacco Use among Low Socio-Economic Status Youth in Delhi, India: Outcomes from Project ACTIVITY, a Cluster Randomized Trial

    ERIC Educational Resources Information Center

    Harrell, Melissa B.; Arora, Monika; Bassi, Shalini; Gupta, Vinay K.; Perry, Cheryl L.; Reddy, K. Srinath

    2016-01-01

    To test the efficacy of an intervention to reduce tobacco use among youth (10-19 years old) in slum communities in Delhi, India. This community-based cluster-randomized trial included 14 slums composed of purposely built resettlement colonies and adjacent inhabitant-built Jhuggi Jhopris. Youth in the intervention received a 2 year…

  9. Cluster-based analysis of multi-model climate ensembles

    NASA Astrophysics Data System (ADS)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and useful framework in which to assess and visualise model spread, offering insight into geographical areas of agreement among models and a measure of diversity across an ensemble. Finally, we discuss caveats of the clustering techniques and note that while we have focused on tropospheric ozone, the principles underlying the cluster-based MMMs are applicable to other prognostic variables from climate models.

  10. A new artefacts resistant method for automatic lineament extraction using Multi-Hillshade Hierarchic Clustering (MHHC)

    NASA Astrophysics Data System (ADS)

    Šilhavý, Jakub; Minár, Jozef; Mentlík, Pavel; Sládek, Ján

    2016-07-01

    This paper presents a new method of automatic lineament extraction which includes the removal of the 'artefacts effect' which is associated with the process of raster based analysis. The core of the proposed Multi-Hillshade Hierarchic Clustering (MHHC) method incorporates a set of variously illuminated and rotated hillshades in combination with hierarchic clustering of derived 'protolineaments'. The algorithm also includes classification into positive and negative lineaments. MHHC was tested in two different territories in Bohemian Forest and Central Western Carpathians. The original vector-based algorithm was developed for comparison of the individual lineaments proximity. Its use confirms the compatibility of manual and automatic extraction and their similar relationships to structural data in the study areas.

  11. Intersection Detection Based on Qualitative Spatial Reasoning on Stopping Point Clusters

    NASA Astrophysics Data System (ADS)

    Zourlidou, S.; Sester, M.

    2016-06-01

    The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location - thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster) and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.

  12. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  13. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models.

    PubMed

    Liu, Jingxia; Colditz, Graham A

    2018-05-01

    There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. The cosmological analysis of X-ray cluster surveys. IV. Testing ASpiX with template-based cosmological simulations

    NASA Astrophysics Data System (ADS)

    Valotti, A.; Pierre, M.; Farahi, A.; Evrard, A.; Faccioli, L.; Sauvageot, J.-L.; Clerc, N.; Pacaud, F.

    2018-06-01

    Context. This paper is the fourth of a series evaluating the ASpiX cosmological method, based on X-ray diagrams, which are constructed from simple cluster observable quantities, namely: count rate (CR), hardness ratio (HR), core radius (rc), and redshift. Aims: Following extensive tests on analytical toy catalogues (Paper III), we present the results of a more realistic study over a 711 deg2 template-based maps derived from a cosmological simulation. Methods: Dark matter haloes from the Aardvark simulation have been ascribed luminosities, temperatures, and core radii, using local scaling relations and assuming self-similar evolution. The predicted X-ray sky-maps were converted into XMM event lists, using a detailed instrumental simulator. The XXL pipeline runs on the resulting sky images, produces an observed cluster catalogue over which the tests have been performed. This allowed us to investigate the relative power of various combinations of the CR, HR, rc, and redshift information. Two fitting methods were used: a traditional Markov chain Monte Carlo (MCMC) approach and a simple minimisation procedure (Amoeba) whose mean uncertainties are a posteriori evaluated by means of synthetic catalogues. The results were analysed and compared to the predictions from the Fisher analysis (FA). Results: For this particular catalogue realisation, assuming that the scaling relations are perfectly known, the CR-HR combination gives σ8 and Ωm at the 10% level, while CR-HR-rc-z improves this to ≤3%. Adding a second HR improves the results from the CR-HR1-rc combination, but to a lesser extent than when adding the redshift information. When all coefficients of the mass-temperature relation (M-T, including scatter) are also fitted, the cosmological parameters are constrained to within 5-10% and larger for the M-T coefficients (up to a factor of two for the scatter). The errors returned by the MCMC, those by Amoeba and the FA predictions are in most cases in excellent agreement and always within a factor of two. We also study the impact of the scatter of the mass-size relation (M-Rc) on the number of detected clusters: for the cluster typical sizes usually assumed, the larger the scatter, the lower the number of detected objects. Conclusions: The present study confirms and extends the trends outlined in our previous analyses, namely the power of X-ray observable diagrams to successfully and easily fit at the same time, the cosmological parameters, cluster physics, and the survey selection, by involving all detected clusters. The accuracy levels quoted should not be considered as definitive. A number of simplifying hypotheses were made for the testing purpose, but this should affect any method in the same way. The next publication will consider in greater detail the impact of cluster shapes (selection and measurements) and of cluster physics on the final error budget by means of hydrodynamical simulations.

  15. Efficient cluster-based catalysts for asymmetric hydrogenation of α-unsaturated carboxylic acids.

    PubMed

    Moberg, Viktor; Duquesne, Robin; Contaldi, Simone; Röhrs, Oliver; Nachtigall, Jonny; Damoense, Llewellyn; Hutton, Alan T; Green, Michael; Monari, Magda; Santelia, Daniela; Haukka, Matti; Nordlander, Ebbe

    2012-09-24

    The new clusters [H(4)Ru(4)(CO)(10)(μ-1,2-P-P)], [H(4)Ru(4)(CO)(10) (1,1-P-P)] and [H(4)Ru(4)(CO)(11)(P-P)] (P-P=chiral diphosphine of the ferrocene-based Josiphos or Walphos ligand families) have been synthesised and characterised. The crystal and molecular structures of eleven clusters reveal that the coordination modes of the diphosphine in the [H(4)Ru(4)(CO)(10)(μ-1,2-P-P)] clusters are different for the Josiphos and the Walphos ligands. The Josiphos ligands bridge a metal-metal bond of the ruthenium tetrahedron in the "conventional" manner, that is, with both phosphine moieties coordinated in equatorial positions relative to a triangular face of the tetrahedron, whereas the phosphine moieties of the Walphos ligands coordinate in one axial and one equatorial position. The differences in the ligand size and the coordination mode between the two types of ligands appear to be reflected in a relative propensity for isomerisation; in solution, the [H(4)Ru(4)(CO)(10)(1,1-Walphos)] clusters isomerise to the corresponding [H(4)Ru(4)(CO)(10)(μ-1,2-Walphos)] clusters, whereas the Josiphos-containing clusters show no tendency to isomerisation in solution. The clusters have been tested as catalysts for asymmetric hydrogenation of four prochiral α-unsaturated carboxylic acids and the prochiral methyl ester (E)-methyl 2-methylbut-2-enoate. High conversion rates (>94%) and selectivities of product formation were observed for almost all catalysts/catalyst precursors. The observed enantioselectivities were low or nonexistent for the Josiphos-containing clusters and catalyst (cluster) recovery was low, suggesting that cluster fragmentation takes place. On the other hand, excellent conversion rates (99-100%), product selectivities (99-100% in most cases) and good enantioselectivities, reaching 90% enantiomeric excess (ee) in certain cases, were observed for the Walphos-containing clusters, and the clusters could be recovered in good yield after completed catalysis. Results from high-pressure NMR and IR studies, catalyst poisoning tests and comparison of catalytic properties of two [H(4)Ru(4)(CO)(10)(μ-1,2-P-P)] clusters (P-P=Walphos ligands) with the analogous mononuclear catalysts [Ru(P-P)(carboxylato)(2)] suggest that these clusters may be the active catalytic species, or direct precursors of an active catalytic cluster species. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. The Xenon Test Chamber Q-SUN® for testing realistic tolerances of fungi exposed to simulated full spectrum solar radiation.

    PubMed

    Dias, Luciana P; Araújo, Claudinéia A S; Pupin, Breno; Ferreira, Paulo C; Braga, Gilberto Ú L; Rangel, Drauzio E N

    2018-06-01

    The low survival of insect-pathogenic fungi when used for insect control in agriculture is mainly due to the deleterious effects of ultraviolet radiation and heat from solar irradiation. In this study, conidia of 15 species of entomopathogenic fungi were exposed to simulated full-spectrum solar radiation emitted by a Xenon Test Chamber Q-SUN XE-3-HC 340S (Q-LAB ® Corporation, Westlake, OH, USA), which very closely simulates full-spectrum solar radiation. A dendrogram obtained from cluster analyses, based on lethal time 50 % and 90 % calculated by Probit analyses, separated the fungi into three clusters: cluster 3 contains species with highest tolerance to simulated full-spectrum solar radiation, included Metarhizium acridum, Cladosporium herbarum, and Trichothecium roseum with LT 50  > 200 min irradiation. Cluster 2 contains eight species with moderate UV tolerance: Aschersonia aleyrodis, Isaria fumosorosea, Mariannaea pruinosa, Metarhizium anisopliae, Metarhizium brunneum, Metarhizium robertsii, Simplicillium lanosoniveum, and Torrubiella homopterorum with LT 50 between 120 and 150 min irradiation. The four species in cluster 1 had the lowest UV tolerance: Lecanicillium aphanocladii, Beauveria bassiana, Tolypocladium cylindrosporum, and Tolypocladium inflatum with LT 50  < 120 min irradiation. The QSUN Xenon Test Chamber XE3 is often used by the pharmaceutical and automotive industry to test light stability and weathering, respectively, but it was never used to evaluate fungal tolerance to full-spectrum solar radiation before. We conclude that the equipment provided an excellent tool for testing realistic tolerances of fungi to full-spectrum solar radiation of microbial agents for insect biological control in agriculture. Copyright © 2018 British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  17. Improved regional-scale Brazilian cropping systems' mapping based on a semi-automatic object-based clustering approach

    NASA Astrophysics Data System (ADS)

    Bellón, Beatriz; Bégué, Agnès; Lo Seen, Danny; Lebourgeois, Valentine; Evangelista, Balbino Antônio; Simões, Margareth; Demonte Ferraz, Rodrigo Peçanha

    2018-06-01

    Cropping systems' maps at fine scale over large areas provide key information for further agricultural production and environmental impact assessments, and thus represent a valuable tool for effective land-use planning. There is, therefore, a growing interest in mapping cropping systems in an operational manner over large areas, and remote sensing approaches based on vegetation index time series analysis have proven to be an efficient tool. However, supervised pixel-based approaches are commonly adopted, requiring resource consuming field campaigns to gather training data. In this paper, we present a new object-based unsupervised classification approach tested on an annual MODIS 16-day composite Normalized Difference Vegetation Index time series and a Landsat 8 mosaic of the State of Tocantins, Brazil, for the 2014-2015 growing season. Two variants of the approach are compared: an hyperclustering approach, and a landscape-clustering approach involving a previous stratification of the study area into landscape units on which the clustering is then performed. The main cropping systems of Tocantins, characterized by the crop types and cropping patterns, were efficiently mapped with the landscape-clustering approach. Results show that stratification prior to clustering significantly improves the classification accuracies for underrepresented and sparsely distributed cropping systems. This study illustrates the potential of unsupervised classification for large area cropping systems' mapping and contributes to the development of generic tools for supporting large-scale agricultural monitoring across regions.

  18. Construct validity of tests that measure kick performance for young soccer players based on cluster analysis: exploring the relationship between coaches rating and actual measures.

    PubMed

    Palucci Vieira, Luiz H; de Andrade, Vitor L; Aquino, Rodrigo L; Moraes, Renato; Barbieri, Fabio A; Cunha, Sérgio A; Bedo, Bruno L; Santiago, Paulo R

    2017-12-01

    The main aim of this study was to verify the relationship between the classification of coaches and actual performance in field tests that measure the kicking performance in young soccer players, using the K-means clustering technique. Twenty-three U-14 players performed 8 tests to measure their kicking performance. Four experienced coaches provided a rating for each player as follows: 1: poor; 2: below average; 3: average; 4: very good; 5: excellent as related to three parameters (i.e. accuracy, power and ability to put spin on the ball). The scores interval established from k-means cluster metric was useful to originating five groups of performance level, since ANOVA revealed significant differences between clusters generated (P<0.01). Accuracy seems to be moderately predicted by the penalty kick, free kick, kicking the ball rolling and Wall Volley Test (0.44≤r≤0.56), while the ability to put spin on the ball can be measured by the free kick and the corner kick tests (0.52≤r≤0.61). Body measurements, age and PHV did not systematically influence the performance. The Wall Volley Test seems to be a good predictor of other tests. Five tests showed reasonable construct validity and can be used to predict the accuracy (penalty kick, free kick, kicking a rolling ball and Wall Volley Test) and ability to put spin on the ball (free kick and corner kick tests) when kicking in soccer. In contrast, the goal kick, kicking the ball when airborne and the vertical kick tests exhibited low power of discrimination and using them should be viewed with caution.

  19. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters.

    PubMed

    Berenguer, Roberto; Pastor-Juan, María Del Rosario; Canales-Vázquez, Jesús; Castro-García, Miguel; Villas, María Victoria; Legorburo, Francisco Mansilla; Sabater, Sebastià

    2018-04-24

    Purpose To identify the reproducible and nonredundant radiomics features (RFs) for computed tomography (CT). Materials and Methods Two phantoms were used to test RF reproducibility by using test-retest analysis, by changing the CT acquisition parameters (hereafter, intra-CT analysis), and by comparing five different scanners with the same CT parameters (hereafter, inter-CT analysis). Reproducible RFs were selected by using the concordance correlation coefficient (as a measure of the agreement between variables) and the coefficient of variation (defined as the ratio of the standard deviation to the mean). Redundant features were grouped by using hierarchical cluster analysis. Results A total of 177 RFs including intensity, shape, and texture features were evaluated. The test-retest analysis showed that 91% (161 of 177) of the RFs were reproducible according to concordance correlation coefficient. Reproducibility of intra-CT RFs, based on coefficient of variation, ranged from 89.3% (151 of 177) to 43.1% (76 of 177) where the pitch factor and the reconstruction kernel were modified, respectively. Reproducibility of inter-CT RFs, based on coefficient of variation, also showed large material differences, from 85.3% (151 of 177; wood) to only 15.8% (28 of 177; polyurethane). Ten clusters were identified after the hierarchical cluster analysis and one RF per cluster was chosen as representative. Conclusion Many RFs were redundant and nonreproducible. If all the CT parameters are fixed except field of view, tube voltage, and milliamperage, then the information provided by the analyzed RFs can be summarized in only 10 RFs (each representing a cluster) because of redundancy. © RSNA, 2018 Online supplemental material is available for this article.

  20. Modeling the Dark Matter of Galaxy Clusters Using the Tensor-Vector-Scalar Theory of Alternate Gravity

    NASA Astrophysics Data System (ADS)

    Ragozzine, Brett

    The invocation of dark matter in the universe is predicated upon gravitational observations that cannot be explained by the amount of luminous matter that we detect. There is an ongoing debate over which gravitational model is correct. The work herein tests a prescription of gravity theory known as Tensor-Vector-Scalar and is based upon the work of Angus et al. (2007). We add upon this work by extending the sample of galaxy clusters to five and testing the accepted Navarro, Frenk & White (NFW) dark matter potential (Navarro et al., 1996). Our independent implementation of this method includes weak gravitational lensing analysis to determine the amount of dark matter in these galaxy clusters by calculating the gas fraction ƒgas = Mgas=Mtot. The ability of the Tensor-Vector-Scalar theory to predict a consistent ƒgas across all galaxy clusters is a measure of its liklihood of being the correct gravity model.

  1. Cluster mass inference via random field theory.

    PubMed

    Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

    2009-01-01

    Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.

  2. Evaluation of AMOEBA: a spectral-spatial classification method

    USGS Publications Warehouse

    Jenson, Susan K.; Loveland, Thomas R.; Bryant, J.

    1982-01-01

    Muitispectral remotely sensed images have been treated as arbitrary multivariate spectral data for purposes of clustering and classifying. However, the spatial properties of image data can also be exploited. AMOEBA is a clustering and classification method that is based on a spatially derived model for image data. In an evaluation test, Landsat data were classified with both AMOEBA and a widely used spectral classifier. The test showed that irrigated crop types can be classified as accurately with the AMOEBA method as with the generally used spectral method ISOCLS; the AMOEBA method, however, requires less computer time.

  3. Application of Artificial Intelligence For Euler Solutions Clustering

    NASA Astrophysics Data System (ADS)

    Mikhailov, V.; Galdeano, A.; Diament, M.; Gvishiani, A.; Agayan, S.; Bogoutdinov, Sh.; Graeva, E.; Sailhac, P.

    Results of Euler deconvolution strongly depend on the selection of viable solutions. Synthetic calculations using multiple causative sources show that Euler solutions clus- ter in the vicinity of causative bodies even when they do not group densely about perimeter of the bodies. We have developed a clustering technique to serve as a tool for selecting appropriate solutions. The method RODIN, employed in this study, is based on artificial intelligence and was originally designed for problems of classification of large data sets. It is based on a geometrical approach to study object concentration in a finite metric space of any dimension. The method uses a formal definition of cluster and includes free parameters that facilitate the search for clusters of given proper- ties. Test on synthetic and real data showed that the clustering technique successfully outlines causative bodies more accurate than other methods of discriminating Euler solutions. In complicated field cases such as the magnetic field in the Gulf of Saint Malo region (Brittany, France), the method provides geologically insightful solutions. Other advantages of the clustering method application are: - Clusters provide solutions associated with particular bodies or parts of bodies permitting the analysis of different clusters of Euler solutions separately. This may allow computation of average param- eters for individual causative bodies. - Those measurements of the anomalous field that yield clusters also form dense clusters themselves. The application of cluster- ing technique thus outlines areas where the influence of different causative sources is more prominent. This allows one to focus on areas for reinterpretation, using different window sizes, structural indices and so on.

  4. Using Cluster Analysis to Compartmentalize a Large Managed Wetland Based on Physical, Biological, and Climatic Geospatial Attributes.

    PubMed

    Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael

    2018-04-27

    Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.

  5. Overlapping Community Detection based on Network Decomposition

    NASA Astrophysics Data System (ADS)

    Ding, Zhuanlian; Zhang, Xingyi; Sun, Dengdi; Luo, Bin

    2016-04-01

    Community detection in complex network has become a vital step to understand the structure and dynamics of networks in various fields. However, traditional node clustering and relatively new proposed link clustering methods have inherent drawbacks to discover overlapping communities. Node clustering is inadequate to capture the pervasive overlaps, while link clustering is often criticized due to the high computational cost and ambiguous definition of communities. So, overlapping community detection is still a formidable challenge. In this work, we propose a new overlapping community detection algorithm based on network decomposition, called NDOCD. Specifically, NDOCD iteratively splits the network by removing all links in derived link communities, which are identified by utilizing node clustering technique. The network decomposition contributes to reducing the computation time and noise link elimination conduces to improving the quality of obtained communities. Besides, we employ node clustering technique rather than link similarity measure to discover link communities, thus NDOCD avoids an ambiguous definition of community and becomes less time-consuming. We test our approach on both synthetic and real-world networks. Results demonstrate the superior performance of our approach both in computation time and accuracy compared to state-of-the-art algorithms.

  6. A Comparison of Seventh Grade Thai Students' Reading Comprehension and Motivation to Read English through Applied Instruction Based on the Genre-Based Approach and the Teacher's Manual

    ERIC Educational Resources Information Center

    Sawangsamutchai, Yutthasak; Rattanavich, Saowalak

    2016-01-01

    The objective of this research is to compare the English reading comprehension and motivation to read of seventh grade Thai students taught with applied instruction through the genre-based approach and teachers' manual. A randomized pre-test post-test control group design was used through the cluster random sampling technique. The data were…

  7. U.S. consumer demand for restaurant calorie information: targeting demographic and behavioral segments in labeling initiatives.

    PubMed

    Kolodinsky, Jane; Reynolds, Travis William; Cannella, Mark; Timmons, David; Bromberg, Daniel

    2009-01-01

    To identify different segments of U.S. consumers based on food choices, exercise patterns, and desire for restaurant calorie labeling. Using a stratified (by region) random sample of the U.S. population, trained interviewers collected data for this cross-sectional study through telephone surveys. Center for Rural Studies U.S. national health survey. The final sample included 580 responses (22% response rate); data were weighted to be representative of age and gender characteristics of the U.S. population. Self-reported behaviors related to food choices, exercise patterns, desire for calorie information in restaurants, and sample demographics. Clusters were identified using Schwartz Bayesian criteria. Impacts of demographic characteristics on cluster membership were analyzed using bivariate tests of association and multinomial logit regression. Cluster analysis revealed three clusters based on respondents' food choices, activity levels, and desire for restaurant labeling. Two clusters, comprising three quarters of the sample, desired calorie labeling in restaurants. The remaining cluster opposed restaurant labeling. Demographic variables significantly predicting cluster membership included region of residence (p < .10), income (p < .05), gender (p < .01), and age (p < .10). Though limited by a low response and potential self-reporting bias in the phone survey, this study suggests that several groups are likely to benefit from restaurant calorie labeling. Specific demographic clusters could be targeted through labeling initiatives.

  8. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

    PubMed

    Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.

  9. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

    PubMed Central

    Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225

  10. Critical thinking in higher education: The influence of teaching styles and peer collaboration on science and math learning

    NASA Astrophysics Data System (ADS)

    Quitadamo, Ian Joseph

    Many higher education faculty perceive a deficiency in students' ability to reason, evaluate, and make informed judgments, skills that are deemed necessary for academic and job success in science and math. These skills, often collected within a domain called critical thinking (CT), have been studied and are thought to be influenced by teaching styles (the combination of beliefs, behavior, and attitudes used when teaching) and small group collaborative learning (SGCL). However, no existing studies show teaching styles and SGCL cause changes in student CT performance. This study determined how combinations of teaching styles called clusters and peer-facilitated SGCL (a specific form of SGCL) affect changes in undergraduate student CT performance using a quasi-experimental pre-test/post-test research design and valid and reliable CT performance indicators. Quantitative analyses of three teaching style cluster models (Grasha's cluster model, a weighted cluster model, and a student-centered/teacher-centered cluster model) and peer-facilitated SGCL were performed to evaluate their ability to cause measurable changes in student CT skills. Based on results that indicated weighted teaching style clusters and peer-facilitated SGCL are associated with significant changes in student CT, we conclude that teaching styles and peer-facilitated SGCL influence the development of undergraduate CT in higher education science and math.

  11. ATCA-based ATLAS FTK input interface system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Okumura, Yasuyuki; Liu, Tiehui Ted; Olsen, Jamieson

    The first stage of the ATLAS Fast TracKer (FTK) is an ATCA-based input interface system, where hits from the entire silicon tracker are clustered and organized into overlapping eta-phi trigger towers before being sent to the tracking engines. First, FTK Input Mezzanine cards receive hit data and perform clustering to reduce data volume. Then, the ATCA-based Data Formatter system will organize the trigger tower data, sharing data among boards over full mesh backplanes and optic fibers. The board and system level design concepts and implementation details, as well as the operation experiences from the FTK full-chain testing, will be presented.

  12. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes.

    PubMed

    Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H

    2015-11-30

    We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Applying the Anderson-Darling test to suicide clusters: evidence of contagion at U. S. universities?

    PubMed

    MacKenzie, Donald W

    2013-01-01

    Suicide clusters at Cornell University and the Massachusetts Institute of Technology (MIT) prompted popular and expert speculation of suicide contagion. However, some clustering is to be expected in any random process. This work tested whether suicide clusters at these two universities differed significantly from those expected under a homogeneous Poisson process, in which suicides occur randomly and independently of one another. Suicide dates were collected for MIT and Cornell for 1990-2012. The Anderson-Darling statistic was used to test the goodness-of-fit of the intervals between suicides to distribution expected under the Poisson process. Suicides at MIT were consistent with the homogeneous Poisson process, while those at Cornell showed clustering inconsistent with such a process (p = .05). The Anderson-Darling test provides a statistically powerful means to identify suicide clustering in small samples. Practitioners can use this method to test for clustering in relevant communities. The difference in clustering behavior between the two institutions suggests that more institutions should be studied to determine the prevalence of suicide clustering in universities and its causes.

  14. Stroke localization and classification using microwave tomography with k-means clustering and support vector machine.

    PubMed

    Guo, Lei; Abbosh, Amin

    2018-05-01

    For any chance for stroke patients to survive, the stroke type should be classified to enable giving medication within a few hours of the onset of symptoms. In this paper, a microwave-based stroke localization and classification framework is proposed. It is based on microwave tomography, k-means clustering, and a support vector machine (SVM) method. The dielectric profile of the brain is first calculated using the Born iterative method, whereas the amplitude of the dielectric profile is then taken as the input to k-means clustering. The cluster is selected as the feature vector for constructing and testing the SVM. A database of MRI-derived realistic head phantoms at different signal-to-noise ratios is used in the classification procedure. The performance of the proposed framework is evaluated using the receiver operating characteristic (ROC) curve. The results based on a two-dimensional framework show that 88% classification accuracy, with a sensitivity of 91% and a specificity of 87%, can be achieved. Bioelectromagnetics. 39:312-324, 2018. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.

  15. Clustering method for counting passengers getting in a bus with single camera

    NASA Astrophysics Data System (ADS)

    Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying

    2010-03-01

    Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.

  16. Riemannian multi-manifold modeling and clustering in brain networks

    NASA Astrophysics Data System (ADS)

    Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.

    2017-08-01

    This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.

  17. Criterion Referenced Inventory. Grade 7 Skill Clusters, Objectives, and Illustrations.

    ERIC Educational Resources Information Center

    Montgomery County Public Schools, Rockville, MD.

    Part of a series of competency-based test materials for grades six through ten, this test booklet for seventh graders contains multiple-choice questions designed to aid in the evaluation of the pupils' library skills. Accompanied by a separate booklet of illustrations which are to be used in conjunction with the questions, the test covers the…

  18. A Hybrid Approach for CpG Island Detection in the Human Genome.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chiang, Yi-Cheng; Chuang, Li-Yeh

    2016-01-01

    CpG islands have been demonstrated to influence local chromatin structures and simplify the regulation of gene activity. However, the accurate and rapid determination of CpG islands for whole DNA sequences remains experimentally and computationally challenging. A novel procedure is proposed to detect CpG islands by combining clustering technology with the sliding-window method (PSO-based). Clustering technology is used to detect the locations of all possible CpG islands and process the data, thus effectively obviating the need for the extensive and unnecessary processing of DNA fragments, and thus improving the efficiency of sliding-window based particle swarm optimization (PSO) search. This proposed approach, named ClusterPSO, provides versatile and highly-sensitive detection of CpG islands in the human genome. In addition, the detection efficiency of ClusterPSO is compared with eight CpG island detection methods in the human genome. Comparison of the detection efficiency for the CpG islands in human genome, including sensitivity, specificity, accuracy, performance coefficient (PC), and correlation coefficient (CC), ClusterPSO revealed superior detection ability among all of the test methods. Moreover, the combination of clustering technology and PSO method can successfully overcome their respective drawbacks while maintaining their advantages. Thus, clustering technology could be hybridized with the optimization algorithm method to optimize CpG island detection. The prediction accuracy of ClusterPSO was quite high, indicating the combination of CpGcluster and PSO has several advantages over CpGcluster and PSO alone. In addition, ClusterPSO significantly reduced implementation time.

  19. Portfolio Decisions and Brain Reactions via the CEAD method.

    PubMed

    Majer, Piotr; Mohr, Peter N C; Heekeren, Hauke R; Härdle, Wolfgang K

    2016-09-01

    Decision making can be a complex process requiring the integration of several attributes of choice options. Understanding the neural processes underlying (uncertain) investment decisions is an important topic in neuroeconomics. We analyzed functional magnetic resonance imaging (fMRI) data from an investment decision study for stimulus-related effects. We propose a new technique for identifying activated brain regions: cluster, estimation, activation, and decision method. Our analysis is focused on clusters of voxels rather than voxel units. Thus, we achieve a higher signal-to-noise ratio within the unit tested and a smaller number of hypothesis tests compared with the often used General Linear Model (GLM). We propose to first conduct the brain parcellation by applying spatially constrained spectral clustering. The information within each cluster can then be extracted by the flexible dynamic semiparametric factor model (DSFM) dimension reduction technique and finally be tested for differences in activation between conditions. This sequence of Cluster, Estimation, Activation, and Decision admits a model-free analysis of the local fMRI signal. Applying a GLM on the DSFM-based time series resulted in a significant correlation between the risk of choice options and changes in fMRI signal in the anterior insula and dorsomedial prefrontal cortex. Additionally, individual differences in decision-related reactions within the DSFM time series predicted individual differences in risk attitudes as modeled with the framework of the mean-variance model.

  20. Application of Hermitian time-dependent coupled-cluster response Ansätze of second order to excitation energies and frequency-dependent dipole polarizabilities

    NASA Astrophysics Data System (ADS)

    Wälz, Gero; Kats, Daniel; Usvyat, Denis; Korona, Tatiana; Schütz, Martin

    2012-11-01

    Linear-response methods, based on the time-dependent variational coupled-cluster or the unitary coupled-cluster model, and truncated at the second order according to the Møller-Plesset partitioning, i.e., the TD-VCC[2] and TD-UCC[2] linear-response methods, are presented and compared. For both of these methods a Hermitian eigenvalue problem has to be solved to obtain excitation energies and state eigenvectors. The excitation energies thus are guaranteed always to be real valued, and the eigenvectors are mutually orthogonal, in contrast to response theories based on “traditional” coupled-cluster models. It turned out that the TD-UCC[2] working equations for excitation energies and polarizabilities are equivalent to those of the second-order algebraic diagrammatic construction scheme ADC(2). Numerical tests are carried out by calculating TD-VCC[2] and TD-UCC[2] excitation energies and frequency-dependent dipole polarizabilities for several test systems and by comparing them to the corresponding values obtained from other second- and higher-order methods. It turns out that the TD-VCC[2] polarizabilities in the frequency regions away from the poles are of a similar accuracy as for other second-order methods, as expected from the perturbative analysis of the TD-VCC[2] polarizability expression. On the other hand, the TD-VCC[2] excitation energies are systematically too low relative to other second-order methods (including TD-UCC[2]). On the basis of these results and an analysis presented in this work, we conjecture that the perturbative expansion of the Jacobian converges more slowly for the TD-VCC formalism than for TD-UCC or for response theories based on traditional coupled-cluster models.

  1. Atomically precise (catalytic) particles synthesized by a novel cluster deposition instrument

    DOE PAGES

    Yin, C.; Tyo, E.; Kuchta, K.; ...

    2014-05-06

    Here, we report a new high vacuum instrument which is dedicated to the preparation of well-defined clusters supported on model and technologically relevant supports for catalytic and materials investigations. The instrument is based on deposition of size selected metallic cluster ions that are produced by a high flux magnetron cluster source. Furthermore, we maximize the throughput of the apparatus by collecting and focusing ions utilizing a conical octupole ion guide and a linear ion guide. The size selection is achieved by a quadrupole mass filter. The new design of the sample holder provides for the preparation of multiple samples onmore » supports of various sizes and shapes in one session. After cluster deposition onto the support of interest, samples will be taken out of the chamber for a variety of testing and characterization.« less

  2. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  3. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106

  4. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.

  5. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    PubMed

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  6. Design and preliminary recruitment results of the Cluster randomised triAl of PSA testing for Prostate cancer (CAP).

    PubMed

    Turner, E L; Metcalfe, C; Donovan, J L; Noble, S; Sterne, J A C; Lane, J A; Avery, K N; Down, L; Walsh, E; Davis, M; Ben-Shlomo, Y; Oliver, S E; Evans, S; Brindle, P; Williams, N J; Hughes, L J; Hill, E M; Davies, C; Ng, S Y; Neal, D E; Hamdy, F C; Martin, R M

    2014-06-10

    Screening for prostate cancer continues to generate controversy because of concerns about over-diagnosis and unnecessary treatment. We describe the rationale, design and recruitment of the Cluster randomised triAl of PSA testing for Prostate cancer (CAP) trial, a UK-wide cluster randomised controlled trial investigating the effectiveness and cost-effectiveness of prostate-specific antigen (PSA) testing. Seven hundred and eighty-five general practitioner (GP) practices in England and Wales were randomised to a population-based PSA testing or standard care and then approached for consent to participate. In the intervention arm, men aged 50-69 years were invited to undergo PSA testing, and those diagnosed with localised prostate cancer were invited into a treatment trial. Control arm practices undertook standard UK management. All men were flagged with the Health and Social Care Information Centre for deaths and cancer registrations. The primary outcome is prostate cancer mortality at a median 10-year-follow-up. Among randomised practices, 271 (68%) in the intervention arm (198,114 men) and 302 (78%) in the control arm (221,929 men) consented to participate, meeting pre-specified power requirements. There was little evidence of differences between trial arms in measured baseline characteristics of the consenting GP practices (or men within those practices). The CAP trial successfully met its recruitment targets and will make an important contribution to international understanding of PSA-based prostate cancer screening.

  7. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks

    PubMed Central

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-01-01

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380

  8. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

    PubMed

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-10-13

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.

  9. Using reflection time-of-flight mass spectrometer techniques to investigate cluster dynamics and bonding

    NASA Astrophysics Data System (ADS)

    Wei, Shiqing; Castleman, A. W., Jr.

    1994-02-01

    Lase based time-of-flight mass spectrometer systems affixed with reflectrons are valuable tools for investigating cluster dynamics and reactions, spectroscopy and structures. Utilizing the reflectron time-of-flight mass spectrometer techniques, both decay fractions and kinetic energy releases of metastable cluster ions can be measured with high precision. By applying related theoretical models, the desired thermochemical values of metastable species can be deduced, which are otherwise very difficult to obtain. Several examples are discussed with attention focused on ammonia as a test case for hydrogen bond systems, and xenon for weaker van der Waals clusters. A brief overview of applications to investigating solvation effects on reactions and structures, delayed electron transfer and ionization through intracluster Penning ionization is also given.

  10. [Does the Youth Psychopathic Traits Inventory (YPI) identify a clinically relevant subgroup among young offenders?].

    PubMed

    Mingers, Daniel; Köhler, Denis; Huchzermeier, Christian; Hinrichs, Günter

    2017-01-01

    Does the Youth Psychopathic Traits Inventory identify one or more high-risk subgroups among young offenders? Which recommendations for possible courses of action can be derived for individual clinical or forensic cases? Method: Model-based cluster analysis (Raftery, 1995) was conducted on a sample of young offenders (N = 445, age 14–22 years, M = 18.5, SD = 1.65). The resulting model was then tested for differences between clusters with relevant context variables of psychopathy. The variables included measures of intelligence, social competence, drug use, and antisocial behavior. Results: Three clusters were found (Low Trait, Impulsive/Irresponsible, Psychopathy) that differ highly significantly concerning YPI scores and the variables mentioned above. The YPI Scores Δ Low = 4.28 (Low Trait – Impulsive/Irresponsible) and Δ High = 6.86 (Impulsive/Irresponsible – Psychopathy) were determined to be thresholds between the clusters. The allocation of a person to be assessed within the calculated clusters allows for an orientation of consequent tests beyond the diagnosis of psychopathy. We conclude that the YPI is a valuable instrument for the assessment of young offenders, as it yields clinically and forensically relevant information concerning the cause and expected development of psychopathological behavior.

  11. Automatic pole-like object modeling via 3D part-based analysis of point cloud

    NASA Astrophysics Data System (ADS)

    He, Liu; Yang, Haoxiang; Huang, Yuchun

    2016-10-01

    Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.

  12. Likelihood-Based Clustering of Meta-Analytic SROC Curves

    ERIC Educational Resources Information Center

    Holling, Heinz; Bohning, Walailuck; Bohning, Dankmar

    2012-01-01

    Meta-analysis of diagnostic studies experience the common problem that different studies might not be comparable since they have been using a different cut-off value for the continuous or ordered categorical diagnostic test value defining different regions for which the diagnostic test is defined to be positive. Hence specificities and…

  13. Reconstruction of the two-dimensional gravitational potential of galaxy clusters from X-ray and Sunyaev-Zel'dovich measurements

    NASA Astrophysics Data System (ADS)

    Tchernin, C.; Bartelmann, M.; Huber, K.; Dekel, A.; Hurier, G.; Majer, C. L.; Meyer, S.; Zinger, E.; Eckert, D.; Meneghetti, M.; Merten, J.

    2018-06-01

    Context. The mass of galaxy clusters is not a direct observable, nonetheless it is commonly used to probe cosmological models. Based on the combination of all main cluster observables, that is, the X-ray emission, the thermal Sunyaev-Zel'dovich (SZ) signal, the velocity dispersion of the cluster galaxies, and gravitational lensing, the gravitational potential of galaxy clusters can be jointly reconstructed. Aims: We derive the two main ingredients required for this joint reconstruction: the potentials individually reconstructed from the observables and their covariance matrices, which act as a weight in the joint reconstruction. We show here the method to derive these quantities. The result of the joint reconstruction applied to a real cluster will be discussed in a forthcoming paper. Methods: We apply the Richardson-Lucy deprojection algorithm to data on a two-dimensional (2D) grid. We first test the 2D deprojection algorithm on a β-profile. Assuming hydrostatic equilibrium, we further reconstruct the gravitational potential of a simulated galaxy cluster based on synthetic SZ and X-ray data. We then reconstruct the projected gravitational potential of the massive and dynamically active cluster Abell 2142, based on the X-ray observations collected with XMM-Newton and the SZ observations from the Planck satellite. Finally, we compute the covariance matrix of the projected reconstructed potential of the cluster Abell 2142 based on the X-ray measurements collected with XMM-Newton. Results: The gravitational potentials of the simulated cluster recovered from synthetic X-ray and SZ data are consistent, even though the potential reconstructed from X-rays shows larger deviations from the true potential. Regarding Abell 2142, the projected gravitational cluster potentials recovered from SZ and X-ray data reproduce well the projected potential inferred from gravitational-lensing observations. We also observe that the covariance matrix of the potential for Abell 2142 reconstructed from XMM-Newton data sensitively depends on the resolution of the deprojected grid and on the smoothing scale used in the deprojection. Conclusions: We show that the Richardson-Lucy deprojection method can be effectively applied on a grid and that the projected potential is well recovered from real and simulated data based on X-ray and SZ signal. The comparison between the reconstructed potentials from the different observables provides additional information on the validity of the assumptions as function of the projected radius.

  14. An updated survey of globular clusters in M 31. III. A spectroscopic metallicity scale for the Revised Bologna Catalog

    NASA Astrophysics Data System (ADS)

    Galleti, S.; Bellazzini, M.; Buzzoni, A.; Federici, L.; Fusi Pecci, F.

    2009-12-01

    Aims. We present a new homogeneous set of metallicity estimates based on Lick indices for the old globular clusters of the M 31 galaxy. The final aim is to add homogeneous spectroscopic metallicities to as many entries as possible of the Revised Bologna Catalog of M 31 clusters, by reporting Lick index measurements from any source (literature, new observations, etc.) on the same scale. Methods: New empirical relations of [Fe/H] as a function of [MgFe] and Mg2 indices are based on the well-studied galactic globular clusters, complemented with theoretical model predictions for -0.2≤ [Fe/H]≤ +0.5. Lick indices for M 31 clusters from various literature sources (225 clusters) and from new observations by our team (71 clusters) have been transformed into the Trager et al. system, yielding new metallicity estimates for 245 globular clusters of M 31. Results: Our values are in good agreement with recent estimates based on detailed spectral fitting and with those obtained from color magnitude diagrams of clusters imaged with the Hubble Space Telescope. The typical uncertainty on individual estimates is ≃±0.25 dex, as resulted from the comparison with metallicities derived from color magnitude diagrams of individual clusters. Conclusions: The metallicity distribution of M 31 globular cluster is briefly discussed and compared with that of the Milky Way. Simple parametric statistical tests suggest that the distribution is probably not unimodal. The strong correlation between metallicity and kinematics found in previous studies is confirmed. The most metal-rich GCs tend to be packed into the center of the system and to cluster tightly around the galactic rotation curve defined by the HI disk, while the velocity dispersion about the curve increases with decreasing metallicity. However, also the clusters with [Fe/H]<-1.0 display a clear rotation pattern, at odds with their Milky Way counterparts. Based on observations made at La Palma, at the Spanish Observatorio del Roque de los Muchachos of the IAC, with the William Herschel Telescope of the Isaac Newton Group and with the Italian Telescopio Nazionale Galileo (TNG) operated by the Fundación Galileo Galilei of INAF. Also based on observations made with the G.B. Cassini Telescope at Loiano (Italy), operated by the Osservatorio Astronomico di Bologna (INAF). Appendices are only available in electronic form at http://www.aanda.org

  15. ClusterCAD: a computational platform for type I modular polyketide synthase design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eng, Clara H.; Backman, Tyler W H; Bailey, Constance B.

    Here, we present ClusterCAD, a web-based toolkit designed to leverage the collinear structure and deterministic logic of type I modular polyketide synthases (PKSs) for synthetic biology applications. The unique organization of these megasynthases, combined with the diversity of their catalytic domain building blocks, has fueled an interest in harnessing the biosynthetic potential of PKSs for the microbial production of both novel natural product analogs and industrially relevant small molecules. However, a limited theoretical understanding of the determinants of PKS fold and function poses a substantial barrier to the design of active variants, and identifying strategies to reliably construct functional PKSmore » chimeras remains an active area of research. In this work, we formalize a paradigm for the design of PKS chimeras and introduce ClusterCAD as a computational platform to streamline and simplify the process of designing experiments to test strategies for engineering PKS variants. ClusterCAD provides chemical structures with stereochemistry for the intermediates generated by each PKS module, as well as sequence- and structure-based search tools that allow users to identify modules based either on amino acid sequence or on the chemical structure of the cognate polyketide intermediate. ClusterCAD can be accessed at https://clustercad.jbei.org and at http://clustercad.igb.uci.edu.« less

  16. Neural network-based multiple robot simultaneous localization and mapping.

    PubMed

    Saeedi, Sajad; Paull, Liam; Trentini, Michael; Li, Howard

    2011-12-01

    In this paper, a decentralized platform for simultaneous localization and mapping (SLAM) with multiple robots is developed. Each robot performs single robot view-based SLAM using an extended Kalman filter to fuse data from two encoders and a laser ranger. To extend this approach to multiple robot SLAM, a novel occupancy grid map fusion algorithm is proposed. Map fusion is achieved through a multistep process that includes image preprocessing, map learning (clustering) using neural networks, relative orientation extraction using norm histogram cross correlation and a Radon transform, relative translation extraction using matching norm vectors, and then verification of the results. The proposed map learning method is a process based on the self-organizing map. In the learning phase, the obstacles of the map are learned by clustering the occupied cells of the map into clusters. The learning is an unsupervised process which can be done on the fly without any need to have output training patterns. The clusters represent the spatial form of the map and make further analyses of the map easier and faster. Also, clusters can be interpreted as features extracted from the occupancy grid map so the map fusion problem becomes a task of matching features. Results of the experiments from tests performed on a real environment with multiple robots prove the effectiveness of the proposed solution.

  17. ClusterCAD: a computational platform for type I modular polyketide synthase design

    DOE PAGES

    Eng, Clara H.; Backman, Tyler W H; Bailey, Constance B.; ...

    2017-10-11

    Here, we present ClusterCAD, a web-based toolkit designed to leverage the collinear structure and deterministic logic of type I modular polyketide synthases (PKSs) for synthetic biology applications. The unique organization of these megasynthases, combined with the diversity of their catalytic domain building blocks, has fueled an interest in harnessing the biosynthetic potential of PKSs for the microbial production of both novel natural product analogs and industrially relevant small molecules. However, a limited theoretical understanding of the determinants of PKS fold and function poses a substantial barrier to the design of active variants, and identifying strategies to reliably construct functional PKSmore » chimeras remains an active area of research. In this work, we formalize a paradigm for the design of PKS chimeras and introduce ClusterCAD as a computational platform to streamline and simplify the process of designing experiments to test strategies for engineering PKS variants. ClusterCAD provides chemical structures with stereochemistry for the intermediates generated by each PKS module, as well as sequence- and structure-based search tools that allow users to identify modules based either on amino acid sequence or on the chemical structure of the cognate polyketide intermediate. ClusterCAD can be accessed at https://clustercad.jbei.org and at http://clustercad.igb.uci.edu.« less

  18. Retrospective swine influenza serological surveillance in the four highest pig density provinces of Thailand before the introduction of the 2009 pandemic Influenza A virus subtype H1N1 using various antibody detection assays.

    PubMed

    Sreta, Donruethai; Jittimanee, Suphattra; Charoenvisal, Nataya; Amonsin, Alongkorn; Kitikoon, Pravina; Thanawongnuwech, Roongroje

    2013-01-01

    Genetic characterization of the hemagglutinin gene of the 6 selected Thai Swine influenza virus (SIV) isolates (4 H1 and 2 H3 isolates) used in the establishment of a hemagglutination inhibition (HI) assay was analyzed. Based on the phylogenetic analysis, Thai SIVs could be divided into 3 clusters of the H1 viruses (clusters I and II belonging to classical swine H1α, and cluster III belonging to classical swine H1γ), and 2 clusters of the H3 viruses both belonging to human-like 1970s. The serological results indicated that swH1N1-06 (H1 cluster I) is a suitable representative SIV for the HI test antigen to detect H1 SIV-specific antibodies in the Thai swine population, while both swH3N2-05 and swH3N2-07 should be used for Thai H3 SIV-specific antibody detection. The HI test results of swine sera collected from pigs in the 4 highest pig population provinces of Thailand indicated that the percentage of pigs seropositive to swH3N2-07 was highest compared to swH1N1-06, swH1N1-09, and swH3N2-05 (85.4%, 50.1%, 18.6%, and 15.8%, respectively). It should be noted that countries lacking SIV genetic information should be concerned with determining the most suitable HI test antigens to use when performing the tests due to the genetic variation and limited cross-reaction of SIVs. The results of the current study demonstrated that HI tests should be implemented with the suitable field strains as the representative test antigen to ascertain accurate SIV serostatus in Thailand and that test antigens should be genetically analyzed and compared with circulating strains regularly.

  19. Multilingual Data Selection for Low Resource Speech Recognition

    DTIC Science & Technology

    2016-09-12

    Figure 1: Identification of language clusters using scores from an LID system training languages used in the Base and OP1 evaluation periods of the Babel...the posterior scores over frames. For a set of languages that are used to train the lan- guage identification (LID) network, pairs of languages that...which are combined during test time to produce 10 dimensional language 3854 Figure 3: Identification of language clusters using scores from individually

  20. Fast gene ontology based clustering for microarray experiments.

    PubMed

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  1. Unsupervised spike sorting based on discriminative subspace learning.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  2. Robust statistical methods for hit selection in RNA interference high-throughput screening experiments.

    PubMed

    Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S

    2006-04-01

    RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.

  3. An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

    PubMed Central

    2012-01-01

    Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526

  4. Coordinate based random effect size meta-analysis of neuroimaging studies.

    PubMed

    Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J

    2017-06-01

    Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Responses of single facial taste fibers in the channel catfish, Ictalurus punctatus, to amino acids.

    PubMed

    Kohbara, J; Michel, W; Caprio, J

    1992-10-01

    1. Amino acids and nucleotides stimulate taste receptors of teleosts. In this report, responses to these compounds of 105 facial taste fibers (79 fully characterized) that innervate maxillary barbel taste buds of the channel catfish (Ictalurus punctatus) were analyzed. 2. The fully characterized facial taste fibers that responded to amino acids (n = 68) were generally poorly responsive to nucleotides and related substances (NRS), whereas the fibers responsive to NRS (n = 11) were poorly responsive to amino acids. Spike discharge of the amino acid-responsive fibers to the most potent amino acid stimulus tested per fiber increased 44-fold from a mean spontaneous activity of 2.1 +/- 3.5 to 92.1 +/- 42.4 (SD) spikes/3 s. Spike activity of the NRS-responsive fibers to NRS increased 11.5-fold from a mean spontaneous activity of 3.4 +/- 5.9 to 39.1 +/- 27.4 spikes/3 s. There was no significant difference between the spontaneous rates, but stimulus evoked spike rates for the amino acid-responsive fibers were significantly greater (P < 0.05; Mann-Whitney test) than those for the NRS-responsive fibers. 3. Hierarchical cluster analysis based on the 3-s response time identified three major groups of neurons. The identified clusters comprised neurons that were highly responsive to either L-alanine (i.e., Ala cluster; n = 39), L-arginine (i.e., Arg cluster; n = 29), or NRS (NRS cluster; n = 11). Fibers comprising the Arg cluster were more narrowly tuned than those within the Ala cluster. This report further characterizes the responses to amino acids of the individual facial taste fibers comprising the Ala and Arg clusters. 4. Subclusters were evident within both of the amino acid-responsive clusters. The Arg cluster was divisible into two subclusters dependent on the response to 1 mM L-proline. Twelve neurons that were significantly (P < 0.05; Mann-Whitney test) more responsive to L-proline than the remaining 17 neurons within the Arg cluster formed the Arg/Pro subcluster; these latter 17 neurons comprised the Arg subcluster. However, there was no significant difference (Mann-Whitney test) in the response to L-arginine between fibers within either subcluster across four different response times analyzed. Fibers within the Ala cluster were generally poorly responsive to L-proline. Four alanine subclusters were suggested on the basis of their relative responses to L-alanine, D-alanine, L-arginine, and the NRS; however, of the 39 fibers comprising the alanine cluster, two alanine subclusters comprised only two fibers each, and the third subcluster consisted of four fibers.(ABSTRACT TRUNCATED AT 400 WORDS)

  6. Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

    NASA Astrophysics Data System (ADS)

    Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

    2009-08-01

    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.

  7. Dynamical Modeling of NGC 6397: Simulated HST Imaging

    NASA Astrophysics Data System (ADS)

    Dull, J. D.; Cohn, H. N.; Lugger, P. M.; Slavin, S. D.; Murphy, B. W.

    1994-12-01

    The proximity of NGC 6397 (2.2 kpc) provides an ideal opportunity to test current dynamical models for globular clusters with the HST Wide-Field/Planetary Camera (WFPC2)\\@. We have used a Monte Carlo algorithm to generate ensembles of simulated Planetary Camera (PC) U-band images of NGC 6397 from evolving, multi-mass Fokker-Planck models. These images, which are based on the post-repair HST-PC point-spread function, are used to develop and test analysis methods for recovering structural information from actual HST imaging. We have considered a range of exposure times up to 2.4times 10(4) s, based on our proposed HST Cycle 5 observations. Our Fokker-Planck models include energy input from dynamically-formed binaries. We have adopted a 20-group mass spectrum extending from 0.16 to 1.4 M_sun. We use theoretical luminosity functions for red giants and main sequence stars. Horizontal branch stars, blue stragglers, white dwarfs, and cataclysmic variables are also included. Simulated images are generated for cluster models at both maximal core collapse and at a post-collapse bounce. We are carrying out stellar photometry on these images using ``DAOPHOT-assisted aperture photometry'' software that we have developed. We are testing several techniques for analyzing the resulting star counts, to determine the underlying cluster structure, including parametric model fits and the nonparametric density estimation methods. Our simulated images also allow us to investigate the accuracy and completeness of methods for carrying out stellar photometry in HST Planetary Camera images of dense cluster cores.

  8. Butyrate production in phylogenetically diverse Firmicutes isolated from the chicken caecum

    PubMed Central

    Eeckhaut, Venessa; Van Immerseel, Filip; Croubels, Siska; De Baere, Siegrid; Haesebrouck, Freddy; Ducatelle, Richard; Louis, Petra; Vandamme, Peter

    2011-01-01

    Summary Sixteen butyrate‐producing bacteria were isolated from the caecal content of chickens and analysed phylogenetically. They did not represent a coherent phylogenetic group, but were allied to four different lineages in the Firmicutes phylum. Fourteen strains appeared to represent novel species, based on a level of ≤ 98.5% 16S rRNA gene sequence similarity towards their nearest validly named neighbours. The highest butyrate concentrations were produced by the strains belonging to clostridial clusters IV and XIVa, clusters which are predominant in the chicken caecal microbiota. In only one of the 16 strains tested, the butyrate kinase operon could be amplified, while the butyryl‐CoA : acetate CoA‐transferase gene was detected in eight strains belonging to clostridial clusters IV, XIVa and XIVb. None of the clostridial cluster XVI isolates carried this gene based on degenerate PCR analyses. However, another CoA‐transferase gene more similar to propionate CoA‐transferase was detected in the majority of the clostridial cluster XVI isolates. Since this gene is located directly downstream of the remaining butyrate pathway genes in several human cluster XVI bacteria, it may be involved in butyrate formation in these bacteria. The present study indicates that butyrate producers related to cluster XVI may play a more important role in the chicken gut than in the human gut. PMID:21375722

  9. Gravitational redshift of galaxies in clusters as predicted by general relativity.

    PubMed

    Wojtak, Radosław; Hansen, Steen H; Hjorth, Jens

    2011-09-28

    The theoretical framework of cosmology is mainly defined by gravity, of which general relativity is the current model. Recent tests of general relativity within the Lambda Cold Dark Matter (ΛCDM) model have found a concordance between predictions and the observations of the growth rate and clustering of the cosmic web. General relativity has not hitherto been tested on cosmological scales independently of the assumptions of the ΛCDM model. Here we report an observation of the gravitational redshift of light coming from galaxies in clusters at the 99 per cent confidence level, based on archival data. Our measurement agrees with the predictions of general relativity and its modification created to explain cosmic acceleration without the need for dark energy (the f(R) theory), but is inconsistent with alternative models designed to avoid the presence of dark matter. © 2011 Macmillan Publishers Limited. All rights reserved

  10. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    NASA Astrophysics Data System (ADS)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  11. EClerize: A customized force-directed graph drawing algorithm for biological graphs with EC attributes.

    PubMed

    Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan

    2018-03-26

    Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to cytoscape ( http://apps.cytoscape.org/apps/eclerize ).

  12. Collaborative Simulation Grid: Multiscale Quantum-Mechanical/Classical Atomistic Simulations on Distributed PC Clusters in the US and Japan

    NASA Technical Reports Server (NTRS)

    Kikuchi, Hideaki; Kalia, Rajiv; Nakano, Aiichiro; Vashishta, Priya; Iyetomi, Hiroshi; Ogata, Shuji; Kouno, Takahisa; Shimojo, Fuyuki; Tsuruta, Kanji; Saini, Subhash; hide

    2002-01-01

    A multidisciplinary, collaborative simulation has been performed on a Grid of geographically distributed PC clusters. The multiscale simulation approach seamlessly combines i) atomistic simulation backed on the molecular dynamics (MD) method and ii) quantum mechanical (QM) calculation based on the density functional theory (DFT), so that accurate but less scalable computations are performed only where they are needed. The multiscale MD/QM simulation code has been Grid-enabled using i) a modular, additive hybridization scheme, ii) multiple QM clustering, and iii) computation/communication overlapping. The Gridified MD/QM simulation code has been used to study environmental effects of water molecules on fracture in silicon. A preliminary run of the code has achieved a parallel efficiency of 94% on 25 PCs distributed over 3 PC clusters in the US and Japan, and a larger test involving 154 processors on 5 distributed PC clusters is in progress.

  13. SCUD: fast structure clustering of decoys using reference state to remove overall rotation.

    PubMed

    Li, Hongzhi; Zhou, Yaoqi

    2005-08-01

    We developed a method for fast decoy clustering by using reference root-mean-squared distance (rRMSD) rather than commonly used pairwise RMSD (pRMSD) values. For 41 proteins with 2000 decoys each, the computing efficiency increases nine times without a significant change in the accuracy of near-native selections. Tests on additional protein decoys based on different reference conformations confirmed this result. Further analysis indicates that the pRMSD and rRMSD values are highly correlated (with an average correlation coefficient of 0.82) and the clusters obtained from pRMSD and rRMSD values are highly similar (the representative structures of the top five largest clusters from the two methods are 74% identical). SCUD (Structure ClUstering of Decoys) with an automatic cutoff value is available at http://theory.med.buffalo.edu. (c) 2005 Wiley Periodicals, Inc.

  14. Glaucomatous patterns in Frequency Doubling Technology (FDT) perimetry data identified by unsupervised machine learning classifiers.

    PubMed

    Bowd, Christopher; Weinreb, Robert N; Balasubramanian, Madhusudhanan; Lee, Intae; Jang, Giljin; Yousefi, Siamak; Zangwill, Linda M; Medeiros, Felipe A; Girkin, Christopher A; Liebmann, Jeffrey M; Goldbaum, Michael H

    2014-01-01

    The variational Bayesian independent component analysis-mixture model (VIM), an unsupervised machine-learning classifier, was used to automatically separate Matrix Frequency Doubling Technology (FDT) perimetry data into clusters of healthy and glaucomatous eyes, and to identify axes representing statistically independent patterns of defect in the glaucoma clusters. FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal FDT results from the UCSD-based Diagnostic Innovations in Glaucoma Study (DIGS) and African Descent and Glaucoma Evaluation Study (ADAGES). For all eyes, VIM input was 52 threshold test points from the 24-2 test pattern, plus age. FDT mean deviation was -1.00 dB (S.D. = 2.80 dB) and -5.57 dB (S.D. = 5.09 dB) in FDT-normal eyes and FDT-abnormal eyes, respectively (p<0.001). VIM identified meaningful clusters of FDT data and positioned a set of statistically independent axes through the mean of each cluster. The optimal VIM model separated the FDT fields into 3 clusters. Cluster N contained primarily normal fields (1109/1190, specificity 93.1%) and clusters G1 and G2 combined, contained primarily abnormal fields (651/786, sensitivity 82.8%). For clusters G1 and G2 the optimal number of axes were 2 and 5, respectively. Patterns automatically generated along axes within the glaucoma clusters were similar to those known to be indicative of glaucoma. Fields located farther from the normal mean on each glaucoma axis showed increasing field defect severity. VIM successfully separated FDT fields from healthy and glaucoma eyes without a priori information about class membership, and identified familiar glaucomatous patterns of loss.

  15. Quality of reporting of pilot and feasibility cluster randomised trials: a systematic review

    PubMed Central

    Chan, Claire L; Leyrat, Clémence; Eldridge, Sandra M

    2017-01-01

    Objectives To systematically review the quality of reporting of pilot and feasibility of cluster randomised trials (CRTs). In particular, to assess (1) the number of pilot CRTs conducted between 1 January 2011 and 31 December 2014, (2) whether objectives and methods are appropriate and (3) reporting quality. Methods We searched PubMed (2011–2014) for CRTs with ‘pilot’ or ‘feasibility’ in the title or abstract; that were assessing some element of feasibility and showing evidence the study was in preparation for a main effectiveness/efficacy trial. Quality assessment criteria were based on the Consolidated Standards of Reporting Trials (CONSORT) extensions for pilot trials and CRTs. Results Eighteen pilot CRTs were identified. Forty-four per cent did not have feasibility as their primary objective, and many (50%) performed formal hypothesis testing for effectiveness/efficacy despite being underpowered. Most (83%) included ‘pilot’ or ‘feasibility’ in the title, and discussed implications for progression from the pilot to the future definitive trial (89%), but fewer reported reasons for the randomised pilot trial (39%), sample size rationale (44%) or progression criteria (17%). Most defined the cluster (100%), and number of clusters randomised (94%), but few reported how the cluster design affected sample size (17%), whether consent was sought from clusters (11%), or who enrolled clusters (17%). Conclusions That only 18 pilot CRTs were identified necessitates increased awareness of the importance of conducting and publishing pilot CRTs and improved reporting. Pilot CRTs should primarily be assessing feasibility, avoiding formal hypothesis testing for effectiveness/efficacy and reporting reasons for the pilot, sample size rationale and progression criteria, as well as enrolment of clusters, and how the cluster design affects design aspects. We recommend adherence to the CONSORT extensions for pilot trials and CRTs. PMID:29122791

  16. Performance map of a cluster detection test using extended power

    PubMed Central

    2013-01-01

    Background Conventional power studies possess limited ability to assess the performance of cluster detection tests. In particular, they cannot evaluate the accuracy of the cluster location, which is essential in such assessments. Furthermore, they usually estimate power for one or a few particular alternative hypotheses and thus cannot assess performance over an entire region. Takahashi and Tango developed the concept of extended power that indicates both the rate of null hypothesis rejection and the accuracy of the cluster location. We propose a systematic assessment method, using here extended power, to produce a map showing the performance of cluster detection tests over an entire region. Methods To explore the behavior of a cluster detection test on identical cluster types at any possible location, we successively applied four different spatial and epidemiological parameters. These parameters determined four cluster collections, each covering the entire study region. We simulated 1,000 datasets for each cluster and analyzed them with Kulldorff’s spatial scan statistic. From the area under the extended power curve, we constructed a map for each parameter set showing the performance of the test across the entire region. Results Consistent with previous studies, the performance of the spatial scan statistic increased with the baseline incidence of disease, the size of the at-risk population and the strength of the cluster (i.e., the relative risk). Performance was heterogeneous, however, even for very similar clusters (i.e., similar with respect to the aforementioned factors), suggesting the influence of other factors. Conclusions The area under the extended power curve is a single measure of performance and, although needing further exploration, it is suitable to conduct a systematic spatial evaluation of performance. The performance map we propose enables epidemiologists to assess cluster detection tests across an entire study region. PMID:24156765

  17. Estimating the intra-cluster correlation coefficient for evaluating an educational intervention program to improve rabies awareness and dog bite prevention among children in Sikkim, India: A pilot study.

    PubMed

    Auplish, Aashima; Clarke, Alison S; Van Zanten, Trent; Abel, Kate; Tham, Charmaine; Bhutia, Thinlay N; Wilks, Colin R; Stevenson, Mark A; Firestone, Simon M

    2017-05-01

    Educational initiatives targeting at-risk populations have long been recognized as a mainstay of ongoing rabies control efforts. Cluster-based studies are often utilized to assess levels of knowledge, attitudes and practices of a population in response to education campaigns. The design of cluster-based studies requires estimates of intra-cluster correlation coefficients obtained from previous studies. This study estimates the school-level intra-cluster correlation coefficient (ICC) for rabies knowledge change following an educational intervention program. A cross-sectional survey was conducted with 226 students from 7 schools in Sikkim, India, using cluster sampling. In order to assess knowledge uptake, rabies education sessions with pre- and post-session questionnaires were administered. Paired differences of proportions were estimated for questions answered correctly. A mixed effects logistic regression model was developed to estimate school-level and student-level ICCs and to test for associations between gender, age, school location and educational level. The school- and student-level ICCs for rabies knowledge and awareness were 0.04 (95% CI: 0.01, 0.19) and 0.05 (95% CI: 0.2, 0.09), respectively. These ICCs suggest design effect multipliers of 5.45 schools and 1.05 students per school, will be required when estimating sample sizes and designing future cluster randomized trials. There was a good baseline level of rabies knowledge (mean pre-session score 71%), however, key knowledge gaps were identified in understanding appropriate behavior around scared dogs, potential sources of rabies and how to correctly order post rabies exposure precaution steps. After adjusting for the effect of gender, age, school location and education level, school and individual post-session test scores improved by 19%, with similar performance amongst boys and girls attending schools in urban and rural regions. The proportion of participants that were able to correctly order post-exposure precautionary steps following educational intervention increased by 87%. The ICC estimates presented in this study will aid in designing cluster-based studies evaluating educational interventions as part of disease control programs. This study demonstrates the likely benefits of educational intervention incorporating bite prevention and rabies education. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Measuring the Scatter of the Mass–Richness Relation in Galaxy Clusters in Photometric Imaging Surveys by Means of Their Correlation Function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campa, Julia; Estrada, Juan; Flaugher, Brenna

    2017-02-03

    The knowledge of the scatter in the mass-observable relation is a key ingredient for a cosmological analysis based on galaxy clusters in a photometric survey. We demonstrate here how the linear bias measured in the correlation function for clusters can be used to determine the value of the scatter. The new method is tested in simulations of a 5.000 square degrees optical survey up to z~1, similar to the ongoing Dark Energy Survey. The results indicate that the scatter can be measured with a precision of 5% using this technique.

  19. Scaled Rocket Testing in Hypersonic Flow

    NASA Technical Reports Server (NTRS)

    Dufrene, Aaron; MacLean, Matthew; Carr, Zakary; Parker, Ron; Holden, Michael; Mehta, Manish

    2015-01-01

    NASA's Space Launch System (SLS) uses four clustered liquid rocket engines along with two solid rocket boosters. The interaction between all six rocket exhaust plumes will produce a complex and severe thermal environment in the base of the vehicle. This work focuses on a recent 2% scale, hot-fire SLS base heating test. These base heating tests are short-duration tests executed with chamber pressures near the full-scale values with gaseous hydrogen/oxygen engines and RSRMV analogous solid propellant motors. The LENS II shock tunnel/Ludwieg tube tunnel was used at or near flight duplicated conditions up to Mach 5. Model development was strongly based on the Space Shuttle base heating tests with several improvements including doubling of the maximum chamber pressures and duplication of freestream conditions. Detailed base heating results are outside of the scope of the current work, rather test methodology and techniques are presented along with broader applicability toward scaled rocket testing in supersonic and hypersonic flow.

  20. A flexible data-driven comorbidity feature extraction framework.

    PubMed

    Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

    2016-06-01

    Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

    PubMed

    Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

    2014-09-30

    This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Synthesis, Characterization, and Reactivity of Functionalized Trinuclear Iron–Sulfur Clusters – A New Class of Bioinspired Hydrogenase Models

    PubMed Central

    Kaiser, Manuel; Knör, Günther

    2015-01-01

    The air- and moisture-stable iron–sulfur carbonyl clusters Fe3S2(CO)7(dppm) (1) and Fe3S2(CO)7(dppf) (2) carrying the bisphosphine ligands bis(diphenylphosphanyl)methane (dppm) and 1,1′-bis(diphenylphosphanyl)ferrocene (dppf) were prepared and fully characterized. Two alternative synthetic routes based on different thionation reactions of triiron dodecacarbonyl were tested. The molecular structures of the methylene-bridged compound 1 and the ferrocene-functionalized derivative 2 were determined by single-crystal X-ray diffraction. The catalytic reactivity of the trinuclear iron–sulfur cluster core for proton reduction in solution at low overpotential was demonstrated. These deeply colored bisphosphine-bridged sulfur-capped iron carbonyl systems are discussed as promising candidates for the development of new bioinspired model compounds of iron-based hydrogenases. PMID:26512211

  3. Clustering by soft-constraint affinity propagation: applications to gene-expression data.

    PubMed

    Leone, Michele; Sumedha; Weigt, Martin

    2007-10-15

    Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.

  4. A matched filter approach for blind joint detection of galaxy clusters in X-ray and SZ surveys

    NASA Astrophysics Data System (ADS)

    Tarrío, P.; Melin, J.-B.; Arnaud, M.

    2018-06-01

    The combination of X-ray and Sunyaev-Zeldovich (SZ) observations can potentially improve the cluster detection efficiency, when compared to using only one of these probes, since both probe the same medium, the hot ionized gas of the intra-cluster medium. We present a method based on matched multifrequency filters (MMF) for detecting galaxy clusters from SZ and X-ray surveys. This method builds on a previously proposed joint X-ray-SZ extraction method and allows the blind detection of clusters, that is finding new clusters without knowing their position, size, or redshift, by searching on SZ and X-ray maps simultaneously. The proposed method is tested using data from the ROSAT all-sky survey and from the Planck survey. The evaluation is done by comparison with existing cluster catalogues in the area of the sky covered by the deep SPT survey. Thanks to the addition of the X-ray information, the joint detection method is able to achieve simultaneously better purity, better detection efficiency, and better position accuracy than its predecessor Planck MMF, which is based on SZ maps alone. For a purity of 85%, the X-ray-SZ method detects 141 confirmed clusters in the SPT region; to detect the same number of confirmed clusters with Planck MMF, we would need to decrease its purity to 70%. We provide a catalogue of 225 sources selected by the proposed method in the SPT footprint, with masses ranging between 0.7 and 14.5 ×1014 M⊙ and redshifts between 0.01 and 1.2.

  5. A path-based measurement for human miRNA functional similarities using miRNA-disease associations

    NASA Astrophysics Data System (ADS)

    Ding, Pingjian; Luo, Jiawei; Xiao, Qiu; Chen, Xiangtao

    2016-09-01

    Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.

  6. Web-Based Evaluation System to Measure Learning Effectiveness in Kampo Medicine

    PubMed Central

    Usuku, Koichiro; Segawa, Makoto; Wang, Yue; Ogashiwa, Kahori; Fujita, Yusuke; Ogihara, Hiroyuki; Tazuma, Susumu

    2016-01-01

    Measuring the learning effectiveness of Kampo Medicine (KM) education is challenging. The aim of this study was to develop a web-based test to measure the learning effectiveness of KM education among medical students (MSs). We used an open-source Moodle platform to test 30 multiple-choice questions classified into 8-type fields (eight basic concepts of KM) including “qi-blood-fluid” and “five-element” theories, on 117 fourth-year MSs. The mean (±standard deviation [SD]) score on the web-based test was 30.2 ± 11.9 (/100). The correct answer rate ranged from 17% to 36%. A pattern-based portfolio enabled these rates to be individualized in terms of KM proficiency. MSs with scores higher (n = 19) or lower (n = 14) than mean ± 1SD were defined as high or low achievers, respectively. Cluster analysis using the correct answer rates for the 8-type field questions revealed clear divisions between high and low achievers. Interestingly, each high achiever had a different proficiency pattern. In contrast, three major clusters were evident among low achievers, all of whom responded with a low percentage of or no correct answers. In addition, a combination of three questions accurately classified high and low achievers. These findings suggest that our web-based test allows individual quantitative assessment of the learning effectiveness of KM education among MSs. PMID:27738440

  7. Web-Based Evaluation System to Measure Learning Effectiveness in Kampo Medicine.

    PubMed

    Iizuka, Norio; Usuku, Koichiro; Nakae, Hajime; Segawa, Makoto; Wang, Yue; Ogashiwa, Kahori; Fujita, Yusuke; Ogihara, Hiroyuki; Tazuma, Susumu; Hamamoto, Yoshihiko

    2016-01-01

    Measuring the learning effectiveness of Kampo Medicine (KM) education is challenging. The aim of this study was to develop a web-based test to measure the learning effectiveness of KM education among medical students (MSs). We used an open-source Moodle platform to test 30 multiple-choice questions classified into 8-type fields (eight basic concepts of KM) including "qi-blood-fluid" and "five-element" theories, on 117 fourth-year MSs. The mean (±standard deviation [SD]) score on the web-based test was 30.2 ± 11.9 (/100). The correct answer rate ranged from 17% to 36%. A pattern-based portfolio enabled these rates to be individualized in terms of KM proficiency. MSs with scores higher ( n = 19) or lower ( n = 14) than mean ± 1SD were defined as high or low achievers, respectively. Cluster analysis using the correct answer rates for the 8-type field questions revealed clear divisions between high and low achievers. Interestingly, each high achiever had a different proficiency pattern. In contrast, three major clusters were evident among low achievers, all of whom responded with a low percentage of or no correct answers. In addition, a combination of three questions accurately classified high and low achievers. These findings suggest that our web-based test allows individual quantitative assessment of the learning effectiveness of KM education among MSs.

  8. Spatial event cluster detection using an approximate normal distribution.

    PubMed

    Torabi, Mahmoud; Rosychuk, Rhonda J

    2008-12-12

    In geographic surveillance of disease, areas with large numbers of disease cases are to be identified so that investigations of the causes of high disease rates can be pursued. Areas with high rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. Typically cluster detection tests are applied to incident or prevalent cases of disease, but surveillance of disease-related events, where an individual may have multiple events, may also be of interest. Previously, a compound Poisson approach that detects clusters of events by testing individual areas that may be combined with their neighbours has been proposed. However, the relevant probabilities from the compound Poisson distribution are obtained from a recursion relation that can be cumbersome if the number of events are large or analyses by strata are performed. We propose a simpler approach that uses an approximate normal distribution. This method is very easy to implement and is applicable to situations where the population sizes are large and the population distribution by important strata may differ by area. We demonstrate the approach on pediatric self-inflicted injury presentations to emergency departments and compare the results for probabilities based on the recursion and the normal approach. We also implement a Monte Carlo simulation to study the performance of the proposed approach. In a self-inflicted injury data example, the normal approach identifies twelve out of thirteen of the same clusters as the compound Poisson approach, noting that the compound Poisson method detects twelve significant clusters in total. Through simulation studies, the normal approach well approximates the compound Poisson approach for a variety of different population sizes and case and event thresholds. A drawback of the compound Poisson approach is that the relevant probabilities must be determined through a recursion relation and such calculations can be computationally intensive if the cluster size is relatively large or if analyses are conducted with strata variables. On the other hand, the normal approach is very flexible, easily implemented, and hence, more appealing for users. Moreover, the concepts may be more easily conveyed to non-statisticians interested in understanding the methodology associated with cluster detection test results.

  9. Empirical entropic contributions in computational docking: evaluation in APS reductase complexes.

    PubMed

    Chang, Max W; Belew, Richard K; Carroll, Kate S; Olson, Arthur J; Goodsell, David S

    2008-08-01

    The results from reiterated docking experiments may be used to evaluate an empirical vibrational entropy of binding in ligand-protein complexes. We have tested several methods for evaluating the vibrational contribution to binding of 22 nucleotide analogues to the enzyme APS reductase. These include two cluster size methods that measure the probability of finding a particular conformation, a method that estimates the extent of the local energetic well by looking at the scatter of conformations within clustered results, and an RMSD-based method that uses the overall scatter and clustering of all conformations. We have also directly characterized the local energy landscape by randomly sampling around docked conformations. The simple cluster size method shows the best performance, improving the identification of correct conformations in multiple docking experiments. 2008 Wiley Periodicals, Inc.

  10. Exact hierarchical clustering in one dimension. [in universe

    NASA Technical Reports Server (NTRS)

    Williams, B. G.; Heavens, A. F.; Peacock, J. A.; Shandarin, S. F.

    1991-01-01

    The present adhesion model-based one-dimensional simulations of gravitational clustering have yielded bound-object catalogs applicable in tests of analytical approaches to cosmological structure formation. Attention is given to Press-Schechter (1974) type functions, as well as to their density peak-theory modifications and the two-point correlation function estimated from peak theory. The extent to which individual collapsed-object locations can be predicted by linear theory is significant only for objects of near-characteristic nonlinear mass.

  11. Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

    2003-01-01

    In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  12. DISENTANGLING THE ICL WITH THE CHEFs: ABELL 2744 AS A CASE STUDY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiménez-Teja, Y.; Dupke, R., E-mail: yojite@iaa.es

    Measurements of the intracluster light (ICL) are still prone to methodological ambiguities, and there are multiple techniques in the literature to address them, mostly based on the binding energy, the local density distribution, or the surface brightness. A common issue with these methods is the a priori assumption of a number of hypotheses on either the ICL morphology, its surface brightness level, or some properties of the brightest cluster galaxy (BCG). The discrepancy in the results is high, and numerical simulations just place a boundary on the ICL fraction in present-day galaxy clusters in the range 10%–50%. We developed amore » new algorithm based on the Chebyshev–Fourier functions to estimate the ICL fraction without relying on any a priori assumption about the physical or geometrical characteristics of the ICL. We are able to not only disentangle the ICL from the galactic luminosity but mark out the limits of the BCG from the ICL in a natural way. We test our technique with the recently released data of the cluster Abell 2744, observed by the Frontier Fields program. The complexity of this multiple merging cluster system and the formidable depth of these images make it a challenging test case to prove the efficiency of our algorithm. We found a final ICL fraction of 19.17 ± 2.87%, which is very consistent with numerical simulations.« less

  13. Cognitive Deficits in Executive Functions and Decision-Making Impairments Cluster Gambling Disorder Sub-types.

    PubMed

    Mallorquí-Bagué, Núria; Tolosa-Sola, Iris; Fernández-Aranda, Fernándo; Granero, Roser; Fagundo, Ana Beatriz; Lozano-Madrid, María; Mestre-Bach, Gemma; Gómez-Peña, Mónica; Aymamí, Neus; Borrás-González, Indira; Sánchez-González, Jessica; Baño, Marta; Del Pino-Gutiérrez, Amparo; Menchón, José M; Jiménez-Murcia, Susana

    2018-03-01

    To identify Gambling Disorder (GD) subtypes, in a population of men seeking treatment for GD, according to specific executive function domains (i.e., cognitive flexibility, inhibition and working memory as well as decision making) which are usually impaired in addictive behaviors. A total of 145 males ranging from 18 to 65 years diagnosed with GD were included in this study. All participants completed: (a) a set of questionnaires to assess psychopathological symptoms, personality and impulsivity traits, and (b) a battery of neuropsychological measures to test different executive functioning domains. Two clusters were identified based on the individual performance on the neuropsychological assessment. Cluster 1 [n = 106; labeled as Low Impaired Executive Function (LIEF)] was composed by patients with poor results in the neuropsychological assessment; cluster 2 patients [n = 46; labeled as High Impaired Executive Function (HIEF)] presented significantly higher deficits on the assessed domains and performed worse than the ones of LIEF cluster. Regarding the characterization of these two clusters, patients in cluster 2 were significantly older, unemployed and registered higher mean age of GD onset than patients in cluster 1. Additionally, patients in cluster 2 also obtained higher psychopathological symptoms, impulsivity (in both positive and negative urgency as well as sensation seeking) and some specific personality traits (higher harm avoidance as well as lower self-directedness and cooperativeness) than patients in cluster 1. The results of this study describe two different GD subtypes based on different cognitive domains (i.e., executive function performance). These two GD subtypes display different impulsivity and personality traits as well as clinical symptoms. The results provide new insight into the etiology and characterization of GD and have the potential to help improving current treatments.

  14. Defective functional connectivity between posterior hypothalamus and regions of the diencephalic-mesencephalic junction in chronic cluster headache.

    PubMed

    Ferraro, Stefania; Nigri, Anna; Bruzzone, Maria Grazia; Brivio, Luca; Proietti Cecchini, Alberto; Verri, Mattia; Chiapparini, Luisa; Leone, Massimo

    2018-01-01

    Objective We tested the hypothesis of a defective functional connectivity between the posterior hypothalamus and diencephalic-mesencephalic regions in chronic cluster headache based on: a) clinical and neuro-endocrinological findings in cluster headache patients; b) neuroimaging findings during cluster headache attacks; c) neuroimaging findings in drug-refractory chronic cluster headache patients improved after successful deep brain stimulation. Methods Resting state functional magnetic resonance imaging, associated with a seed-based approach, was employed to investigate the functional connectivity of the posterior hypothalamus in chronic cluster headache patients (n = 17) compared to age and sex-matched healthy subjects (n = 16). Random-effect analyses were performed to study differences between patients and controls in ipsilateral and contralateral-to-the-pain posterior hypothalamus functional connectivity. Results Cluster headache patients showed an increased functional connectivity between the ipsilateral posterior hypothalamus and a number of diencephalic-mesencephalic structures, comprising ventral tegmental area, dorsal nuclei of raphe, and bilateral substantia nigra, sub-thalamic nucleus, and red nucleus ( p < 0.005 FDR-corrected vs . control group). No difference between patients and controls was found comparing the contralateral hypothalami. Conclusions The observed deranged functional connectivity between the posterior ipsilateral hypothalamus and diencephalic-mesencephalic regions in chronic cluster headache patients mainly involves structures that are part of (i.e. ventral tegmental area, substantia nigra) or modulate (dorsal nuclei of raphe, sub-thalamic nucleus) the midbrain dopaminergic systems. The midbrain dopaminergic systems could play a role in cluster headache pathophysiology and in particular in the chronicization process. Future studies are needed to better clarify if this finding is specific to cluster headache or if it represents an unspecific response to chronic pain.

  15. Task demand influences relationships among sex, clustering strategy, and recall: 16-word versus 9-word list learning tests.

    PubMed

    Sunderaraman, Preeti; Blumen, Helena M; DeMatteo, David; Apa, Zoltan L; Cosentino, Stephanie

    2013-06-01

    We compared the relationships among sex, clustering strategy, and recall across different task demands using the 16-word California Verbal Learning Test-Second Edition (CVLT-II) and the 9-word Philadelphia (repeatable) Verbal Learning Test (PrVLT). Women generally score higher than men on verbal memory tasks, possibly because women tend to use semantic clustering. This sex difference has been established via word-list learning tests such as the CVLT-II. In a retrospective between-group study, we compared how 2 separate groups of cognitively healthy older adults performed on a longer and a shorter verbal learning test. The group completing the CVLT-II had 36 women and 26 men; the group completing the PrVLT had 27 women and 21 men. Overall, multiple regression analyses revealed that semantic clustering was significantly associated with total recall on both tests' lists (P<0.001). Sex differences in recall and semantic clustering diminished with the shorter PrVLT word list. Semantic clustering uniquely influenced recall on both the longer and shorter word lists. However, serial clustering and sex influenced recall depending on the length of the word list (ie, the task demand). These findings suggest a complex nonlinear relationship among verbal memory, clustering strategies, and task demand.

  16. Clustering Multivariate Time Series Using Hidden Markov Models

    PubMed Central

    Ghassempour, Shima; Girosi, Federico; Maeder, Anthony

    2014-01-01

    In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996

  17. A cluster analytic study of the Wechsler Intelligence Test for Children-IV in children referred for psychoeducational assessment due to persistent academic difficulties.

    PubMed

    Hale, Corinne R; Casey, Joseph E; Ricciardi, Philip W R

    2014-02-01

    Wechsler Intelligence Test for Children-IV core subtest scores of 472 children were cluster analyzed to determine if reliable and valid subgroups would emerge. Three subgroups were identified. Clusters were reliable across different stages of the analysis as well as across algorithms and samples. With respect to external validity, the Globally Low cluster differed from the other two clusters on Wechsler Individual Achievement Test-II Word Reading, Numerical Operations, and Spelling subtests, whereas the latter two clusters did not differ from one another. The clusters derived have been identified in studies using previous WISC editions. Clusters characterized by poor performance on subtests historically associated with the VIQ (i.e., VCI + WMI) and PIQ (i.e., POI + PSI) did not emerge, nor did a cluster characterized by low scores on PRI subtests. Picture Concepts represented the highest subtest score in every cluster, failing to vary in a predictable manner with the other PRI subtests.

  18. An effectiveness study of an integrated, community-based package for maternal, newborn, child and HIV care in South Africa: study protocol for a randomized controlled trial

    PubMed Central

    2011-01-01

    Background Progress towards MDG4 in South Africa will depend largely on scaling up effective prevention against mother to child transmission (PMTCT) of HIV and also addressing neonatal mortality. This imperative drives increasing focus on the neonatal period and particularly on the development and testing of appropriate models of sustainable, community-based care in South Africa in order to reach the poor. A number of key implementation gaps affecting progress have been identified. Implementation gaps for HIV prevention in neonates; implementation gaps for neonatal care especially home postnatal care; and implementation gaps for maternal mental health support. We have developed and are evaluating and costing an integrated and scaleable home visit package delivered by community health workers targeting pregnant and postnatal women and their newborns to provide essential maternal/newborn care as well as interventions for Prevention of Mother to Child Transmission (PMTCT) of HIV. Methods The trial is a cluster randomized controlled trial that is being implemented in Umlazi which is a peri-urban settlement with a total population of 1 million close to Durban in KwaZulu Natal, South Africa. The trial consists of 30 randomized clusters (15 in each arm). A baseline survey established the homogeneity of clusters and neither stratification nor matching was performed. Sample size was based on increasing HIV-free survival from 74% to 84%, and calculated to be 120 pregnant women per cluster. Primary outcomes are higher levels of HIV free survival and levels of exclusive and appropriate infant feeding at 12 weeks postnatally. The intervention is home based with community health workers delivering two antenatal visits, a postnatal visit within 48 hours of birth, and a further four visits during the first two months of the infants life. We are undertaking programmatic and cost effectiveness analysis to cost the intervention. Discussion The question is not merely to develop an efficacious package but also to identify and test delivery strategies that enable scaling up, which requires effectiveness studies in a health systems context, adapting and testing Asian community-based studies in various African contexts. Trial registration ISRCTN: ISRCTN41046462 PMID:22044553

  19. Development and Testing of an Algorithm for Efficient Resource Positioning in Pre-hospital Emergency Care

    PubMed Central

    Saini, Devashish; Mazza, Giovanni; Shah, Najaf; Mirza, Muzna; Gori, Mandar M; Nandigam, Hari Krishna; Orthner, Helmuth F

    2006-01-01

    Response times for pre-hospital emergency care may be improved with the use of algorithms that analyzes historical patterns in incident location and suggests optimal places for prepositioning of emergency response units. We will develop such an algorithm based on cluster analysis and test whether it leads to significant improvement in mileage when compared to actual historical data of dispatching based on fixed stations. PMID:17238702

  20. Development and testing of an algorithm for efficient resource positioning in pre-hospital emergency care.

    PubMed

    Saini, Devashish; Mazza, Giovanni; Shah, Najaf; Mirza, Muzna; Gori, Mandar M; Nandigam, Hari Krishna; Orthner, Helmuth F

    2006-01-01

    Response times for pre-hospital emergency care may be improved with the use of algorithms that analyzes historical patterns in incident location and suggests optimal places for pre-positioning of emergency response units. We will develop such an algorithm based on cluster analysis and test whether it leads to significant improvement in mileage when compared to actual historical data of dispatching based on fixed stations.

  1. Robust watermarking scheme for binary images using a slice-based large-cluster algorithm with a Hamming Code

    NASA Astrophysics Data System (ADS)

    Chen, Wen-Yuan; Liu, Chen-Chung

    2006-01-01

    The problems with binary watermarking schemes are that they have only a small amount of embeddable space and are not robust enough. We develop a slice-based large-cluster algorithm (SBLCA) to construct a robust watermarking scheme for binary images. In SBLCA, a small-amount cluster selection (SACS) strategy is used to search for a feasible slice in a large-cluster flappable-pixel decision (LCFPD) method, which is used to search for the best location for concealing a secret bit from a selected slice. This method has four major advantages over the others: (a) SBLCA has a simple and effective decision function to select appropriate concealment locations, (b) SBLCA utilizes a blind watermarking scheme without the original image in the watermark extracting process, (c) SBLCA uses slice-based shuffling capability to transfer the regular image into a hash state without remembering the state before shuffling, and finally, (d) SBLCA has enough embeddable space that every 64 pixels could accommodate a secret bit of the binary image. Furthermore, empirical results on test images reveal that our approach is a robust watermarking scheme for binary images.

  2. 12. CONTROL PANELS, WEST SIDE (LEFT & RIGHT), MAIN FLOOR: ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    12. CONTROL PANELS, WEST SIDE (LEFT & RIGHT), MAIN FLOOR: CENTER OF CLUSTERS, TOP BOX: MEGAWATT METER CENTER OF CLUSTERS, LOWER THREE BOXES: AMPERE METERS LEFT SIDE OF CLUSTERS: VOLTAGE CHART RECORDER RIGHT SIDE OF CLUSTERS: RECLOSE RELAY CENTER UNDER CLUSTERS: TESTING SWITCHES BELOW TESTING SWITCHES: BREAKER SWITCHES - Bonneville Power Administration South Bank Substation, I-84, South of Bonneville Dam Powerhouse, Bonneville, Multnomah County, OR

  3. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    PubMed

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  4. A non-voxel-based broad-beam (NVBB) framework for IMRT treatment planning.

    PubMed

    Lu, Weiguo

    2010-12-07

    We present a novel framework that enables very large scale intensity-modulated radiation therapy (IMRT) planning in limited computation resources with improvements in cost, plan quality and planning throughput. Current IMRT optimization uses a voxel-based beamlet superposition (VBS) framework that requires pre-calculation and storage of a large amount of beamlet data, resulting in large temporal and spatial complexity. We developed a non-voxel-based broad-beam (NVBB) framework for IMRT capable of direct treatment parameter optimization (DTPO). In this framework, both objective function and derivative are evaluated based on the continuous viewpoint, abandoning 'voxel' and 'beamlet' representations. Thus pre-calculation and storage of beamlets are no longer needed. The NVBB framework has linear complexities (O(N(3))) in both space and time. The low memory, full computation and data parallelization nature of the framework render its efficient implementation on the graphic processing unit (GPU). We implemented the NVBB framework and incorporated it with the TomoTherapy treatment planning system (TPS). The new TPS runs on a single workstation with one GPU card (NVBB-GPU). Extensive verification/validation tests were performed in house and via third parties. Benchmarks on dose accuracy, plan quality and throughput were compared with the commercial TomoTherapy TPS that is based on the VBS framework and uses a computer cluster with 14 nodes (VBS-cluster). For all tests, the dose accuracy of these two TPSs is comparable (within 1%). Plan qualities were comparable with no clinically significant difference for most cases except that superior target uniformity was seen in the NVBB-GPU for some cases. However, the planning time using the NVBB-GPU was reduced many folds over the VBS-cluster. In conclusion, we developed a novel NVBB framework for IMRT optimization. The continuous viewpoint and DTPO nature of the algorithm eliminate the need for beamlets and lead to better plan quality. The computation parallelization on a GPU instead of a computer cluster significantly reduces hardware and service costs. Compared with using the current VBS framework on a computer cluster, the planning time is significantly reduced using the NVBB framework on a single workstation with a GPU card.

  5. Spatial Analysis of HIV Positive Injection Drug Users in San Francisco, 1987 to 2005

    PubMed Central

    Martinez, Alexis N.; Mobley, Lee R.; Lorvick, Jennifer; Novak, Scott P.; Lopez, Andrea M.; Kral, Alex H.

    2014-01-01

    Spatial analyses of HIV/AIDS related outcomes are growing in popularity as a tool to understand geographic changes in the epidemic and inform the effectiveness of community-based prevention and treatment programs. The Urban Health Study was a serial, cross-sectional epidemiological study of injection drug users (IDUs) in San Francisco between 1987 and 2005 (N = 29,914). HIV testing was conducted for every participant. Participant residence was geocoded to the level of the United States Census tract for every observation in dataset. Local indicator of spatial autocorrelation (LISA) tests were used to identify univariate and bivariate Census tract clusters of HIV positive IDUs in two time periods. We further compared three tract level characteristics (% poverty, % African Americans, and % unemployment) across areas of clustered and non-clustered tracts. We identified significant spatial clustering of high numbers of HIV positive IDUs in the early period (1987–1995) and late period (1996–2005). We found significant bivariate clusters of Census tracts where HIV positive IDUs and tract level poverty were above average compared to the surrounding areas. Our data suggest that poverty, rather than race, was an important neighborhood characteristic associated with the spatial distribution of HIV in SF and its spatial diffusion over time. PMID:24722543

  6. Use of molecular testing to identify a cluster of patients with polycythemia vera in eastern Pennsylvania.

    PubMed

    Seaman, Vincent; Jumaan, Aisha; Yanni, Emad; Lewis, Brian; Neyer, Jonathan; Roda, Paul; Xu, Mingjiang; Hoffman, Ronald

    2009-02-01

    The role of the environment in the origin of polycythemia vera has not been well documented. Recently, molecular diagnostic tools have been developed to facilitate the diagnosis of polycythemia vera. A cluster of patients with polycythemia vera was suspected in three countries in eastern Pennsylvania where there have long been a concern about environment hazards. Rigorous clinical criteria and JAK2 617V>F testing were used to confirm the diagnosis of polycythemia vera in patients in this area. Participants included cases of polycythemia vera from the 2001 to 2005 state cancer registry as well as self- and physician-referred cases. A diagnosis of polycythemia vera was confirmed in 53% of 62 participants using WHO criteria, which includes JAK2 617V>F testing. A statistically significant cluster of cases (P < 0.001) was identified where the incidence of polycythemia vera was 4.3 times that of the rest of the study area. The area of the cluster contained numerous sources of hazardous material including waste-coal power plants and U.S. Environmental Protection Agency Superfund sites. The diagnosis of polycythemia vera based solely on clinical criteria is frequently erroneous, suggesting that our prior knowledge of the epidemiology of this disease might be inaccurate. The JAK2 617V>F mutational analysis provides diagnostic clarity and permitted the confirmation of a cluster of polycythemia vera cases not identified by traditional clinical and pathologic diagnostic criteria. The close proximity of this cluster to known areas of hazardous material exposure raises concern that such environmental factors might play a role in the origin of polycythemia vera.

  7. A comprehensive comparative test of seven widely used spectral synthesis models against multi-band photometry of young massive-star clusters

    NASA Astrophysics Data System (ADS)

    Wofford, A.; Charlot, S.; Bruzual, G.; Eldridge, J. J.; Calzetti, D.; Adamo, A.; Cignoni, M.; de Mink, S. E.; Gouliermis, D. A.; Grasha, K.; Grebel, E. K.; Lee, J. C.; Östlin, G.; Smith, L. J.; Ubeda, L.; Zackrisson, E.

    2016-04-01

    We test the predictions of spectral synthesis models based on seven different massive-star prescriptions against Legacy ExtraGalactic UV Survey (LEGUS) observations of eight young massive clusters in two local galaxies, NGC 1566 and NGC 5253, chosen because predictions of all seven models are available at the published galactic metallicities. The high angular resolution, extensive cluster inventory, and full near-ultraviolet to near-infrared photometric coverage make the LEGUS data set excellent for this study. We account for both stellar and nebular emission in the models and try two different prescriptions for attenuation by dust. From Bayesian fits of model libraries to the observations, we find remarkably low dispersion in the median E(B - V) (˜0.03 mag), stellar masses (˜104 M⊙), and ages (˜1 Myr) derived for individual clusters using different models, although maximum discrepancies in these quantities can reach 0.09 mag and factors of 2.8 and 2.5, respectively. This is for ranges in median properties of 0.05-0.54 mag, 1.8-10 × 104 M⊙, and 1.6-40 Myr spanned by the clusters in our sample. In terms of best fit, the observations are slightly better reproduced by models with interacting binaries and least well reproduced by models with single rotating stars. Our study provides a first quantitative estimate of the accuracies and uncertainties of the most recent spectral synthesis models of young stellar populations, demonstrates the good progress of models in fitting high-quality observations, and highlights the needs for a larger cluster sample and more extensive tests of the model parameter space.

  8. Mind-body treatments for the pain-fatigue-sleep disturbance symptom cluster in persons with cancer.

    PubMed

    Kwekkeboom, Kristine L; Cherwin, Catherine H; Lee, Jun W; Wanta, Britt

    2010-01-01

    Co-occurring pain, fatigue, and sleep disturbance comprise a common symptom cluster in patients with cancer. Treatment approaches that target the cluster of symptoms rather than just a single symptom need to be identified and tested. To synthesize evidence regarding mind-body interventions that have shown efficacy in treating two or more symptoms in the pain-fatigue-sleep disturbance cancer symptom cluster. A literature search was conducted using CINAHL, Medline, and PsychInfo databases through March 2009. Studies were categorized based on the type of mind-body intervention (relaxation, imagery/hypnosis, cognitive-behavioral therapy/coping skills training [CBT/CST], meditation, music, and virtual reality), and a preliminary review was conducted with respect to efficacy for pain, fatigue, and sleep disturbance. Mind-body interventions were selected for review if there was evidence of efficacy for at least two of the three symptoms. Forty-three studies addressing five types of mind-body interventions met criteria and are summarized in this review. Imagery/hypnosis and CBT/CST interventions have produced improvement in all the three cancer-related symptoms individually: pain, fatigue, and sleep disturbance. Relaxation has resulted in improvements in pain and sleep disturbance. Meditation interventions have demonstrated beneficial effects on fatigue and sleep disturbance. Music interventions have demonstrated efficacy for pain and fatigue. No trials were found that tested the mind-body interventions specifically for the pain-fatigue-sleep disturbance symptom cluster. Efficacy studies are needed to test the impact of relaxation, imagery/hypnosis, CBT/CST, meditation, and music interventions in persons with cancer experiencing concurrent pain, fatigue, and sleep disturbance. These mind-body interventions could help patients manage all the symptoms in the cluster with a single treatment strategy. Copyright 2010 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.

  9. [Seed vigor evaluation based on adversity resistance index of wheat seed germination under stress conditions.

    PubMed

    Chen, Lei Tai; Sun, Ai Qing; Yang, Min; Chen, Lu Lu; Ma, Xue Li; Li, Mei Ling; Yin, Yan Ping

    2016-09-01

    A total of 16 wheat cultivars were selected to detect seed vigor of different genotypes using standard germination test, seed germination test under stress conditions and field emergence test. The adversity resistance indices of seed vigor indices and field emergence percentage under different germination conditions were used as the indices to evaluate adversity resistance. Principal component analysis and cluster analysis were used for the comprehensive evaluation of seed vigor. Results showed that drought stress, artificial aging and cold soaking treatments affected seed vigor to some extent. The adversity resistance indices of the artificial aging and cold soaking tests were significantly positively correlated with the field emergence percentage, while the adversity resistance index of drought stress test had no significant correlation with the field emergence percentage. 16 wheat cultivars were classified as three groups based on the principal component analysis and cluster analysis. Yunong 949, Yumai 49-198, Luyuan 502, Zhengyumai 9987, Shimai 21, Shannong 23, and Shixin 828 belonged to high vigor seeds. Xunong 5, Yunong 982, Tangmai 8, Jimai 20, Jimai 22, Jinan 17, and Shannong 20 belonged to medium vigor seeds. The other two cultivars, Chang 4738 and Lunxuan 061, belonged to low vigor seeds.

  10. Saliency detection algorithm based on LSC-RC

    NASA Astrophysics Data System (ADS)

    Wu, Wei; Tian, Weiye; Wang, Ding; Luo, Xin; Wu, Yingfei; Zhang, Yu

    2018-02-01

    Image prominence is the most important region in an image, which can cause the visual attention and response of human beings. Preferentially allocating the computer resources for the image analysis and synthesis by the significant region is of great significance to improve the image area detecting. As a preprocessing of other disciplines in image processing field, the image prominence has widely applications in image retrieval and image segmentation. Among these applications, the super-pixel segmentation significance detection algorithm based on linear spectral clustering (LSC) has achieved good results. The significance detection algorithm proposed in this paper is better than the regional contrast ratio by replacing the method of regional formation in the latter with the linear spectral clustering image is super-pixel block. After combining with the latest depth learning method, the accuracy of the significant region detecting has a great promotion. At last, the superiority and feasibility of the super-pixel segmentation detection algorithm based on linear spectral clustering are proved by the comparative test.

  11. Discriminative potential of some PCR-based and biochemical methods at Scedosporium strains.

    PubMed

    Kraková, Lucia; Pangallo, Domenico; Piecková, Elena; Majorošová, Mária

    2016-02-01

    Three innovative PCR-based methods (fluorescent-ITS, fluorescent-CBH and ITS-PCR DGGE) were tested using a reference set of nine strains of Scedosporium from the CBS fungal collection. Cellulolytic, lipolytic and proteolytic potential and the ability to dissolve CaCO3 of the strains were evaluated in vitro by means of agar assays. f-ITS profiles almost recognized main species, although included "Pseudallescheria" ellipsoidea and the Scedosporium boydii CBS 117432 and CBS 120157 in the same cluster. All strains successfully produced DNA polymorphisms by f-CBH amplification which divided them into three different groups. The DGGE approach separated the strains studied into other five clusters which in some case were not matching with species. Strains tested were monomorphic in possessing strong proteolytic and lipolytic activities. The comparison of the three PCR-based genotyping approaches, together with biodegradation ability screening, displayed an intraspecies variability in S. boydii, interfering with unambiguous species delimitation. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  12. A cluster randomized control field trial of the ABRACADABRA web-based reading technology: replication and extension of basic findings

    PubMed Central

    Piquette, Noella A.; Savage, Robert S.; Abrami, Philip C.

    2014-01-01

    The present paper reports a cluster randomized control trial evaluation of teaching using ABRACADABRA (ABRA), an evidence-based and web-based literacy intervention (http://abralite.concordia.ca) with 107 kindergarten and 96 grade 1 children in 24 classes (12 intervention 12 control classes) from all 12 elementary schools in one school district in Canada. Children in the intervention condition received 10–12 h of whole class instruction using ABRA between pre- and post-test. Hierarchical linear modeling of post-test results showed significant gains in letter-sound knowledge for intervention classrooms over control classrooms. In addition, medium effect sizes were evident for three of five outcome measures favoring the intervention: letter-sound knowledge (d= +0.66), phonological blending (d = +0.52), and word reading (d = +0.52), over effect sizes for regular teaching. It is concluded that regular teaching with ABRA technology adds significantly to literacy in the early elementary years. PMID:25538663

  13. A comparison of confidence interval methods for the intraclass correlation coefficient in community-based cluster randomization trials with a binary outcome.

    PubMed

    Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan

    2016-04-01

    Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith's method provide nominal or close to nominal coverage when the intraclass correlation coefficient is small (<0.05), as is the case in most community intervention trials. This study concludes that when a binary outcome variable is measured in a small number of large clusters, confidence intervals for the intraclass correlation coefficient may be constructed by dividing existing clusters into sub-clusters (e.g. groups of 5) and using Smith's method. The resulting confidence intervals provide nominal or close to nominal coverage across a wide range of parameters when the intraclass correlation coefficient is small (<0.05). Application of this method should provide investigators with a better understanding of the uncertainty associated with a point estimator of the intraclass correlation coefficient used for determining the sample size needed for a newly designed community-based trial. © The Author(s) 2015.

  14. Identification and DUS Testing of Rice Varieties through Microsatellite Markers.

    PubMed

    Pourabed, Ehsan; Jazayeri Noushabadi, Mohammad Reza; Jamali, Seyed Hossein; Moheb Alipour, Naser; Zareyan, Abbas; Sadeghi, Leila

    2015-01-01

    Identification and registration of new rice varieties are very important to be free from environmental effects and using molecular markers that are more reliable. The objectives of this study were, first, the identification and distinction of 40 rice varieties consisting of local varieties of Iran, improved varieties, and IRRI varieties using PIC, and discriminating power, second, cluster analysis based on Dice similarity coefficient and UPGMA algorithm, and, third, determining the ability of microsatellite markers to separate varieties utilizing the best combination of markers. For this research, 12 microsatellite markers were used. In total, 83 polymorphic alleles (6.91 alleles per locus) were found. In addition, the variation of PIC was calculated from 0.52 to 0.9. The results of cluster analysis showed the complete discrimination of varieties from each other except for IR58025A and IR58025B. Moreover, cluster analysis could detect the most of the improved varieties from local varieties. Based on the best combination of markers analysis, five pair primers together have shown the same results of all markers for detection among all varieties. Considering the results of this research, we can propose that microsatellite markers can be used as a complementary tool for morphological characteristics in DUS tests.

  15. Computer-aided detection of clustered microcalcifications in multiscale bilateral filtering regularized reconstructed digital breast tomosynthesis volume

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samala, Ravi K., E-mail: rsamala@umich.edu; Chan, Heang-Ping; Lu, Yao

    Purpose: Develop a computer-aided detection (CADe) system for clustered microcalcifications in digital breast tomosynthesis (DBT) volume enhanced with multiscale bilateral filtering (MSBF) regularization. Methods: With Institutional Review Board approval and written informed consent, two-view DBT of 154 breasts, of which 116 had biopsy-proven microcalcification (MC) clusters and 38 were free of MCs, was imaged with a General Electric GEN2 prototype DBT system. The DBT volumes were reconstructed with MSBF-regularized simultaneous algebraic reconstruction technique (SART) that was designed to enhance MCs and reduce background noise while preserving the quality of other tissue structures. The contrast-to-noise ratio (CNR) of MCs was furthermore » improved with enhancement-modulated calcification response (EMCR) preprocessing, which combined multiscale Hessian response to enhance MCs by shape and bandpass filtering to remove the low-frequency structured background. MC candidates were then located in the EMCR volume using iterative thresholding and segmented by adaptive region growing. Two sets of potential MC objects, cluster centroid objects and MC seed objects, were generated and the CNR of each object was calculated. The number of candidates in each set was controlled based on the breast volume. Dynamic clustering around the centroid objects grouped the MC candidates to form clusters. Adaptive criteria were designed to reduce false positive (FP) clusters based on the size, CNR values and the number of MCs in the cluster, cluster shape, and cluster based maximum intensity projection. Free-response receiver operating characteristic (FROC) and jackknife alternative FROC (JAFROC) analyses were used to assess the performance and compare with that of a previous study. Results: Unpaired two-tailedt-test showed a significant increase (p < 0.0001) in the ratio of CNRs for MCs with and without MSBF regularization compared to similar ratios for FPs. For view-based detection, a sensitivity of 85% was achieved at an FP rate of 2.16 per DBT volume. For case-based detection, a sensitivity of 85% was achieved at an FP rate of 0.85 per DBT volume. JAFROC analysis showed a significant improvement in the performance of the current CADe system compared to that of our previous system (p = 0.003). Conclusions: MBSF regularized SART reconstruction enhances MCs. The enhancement in the signals, in combination with properly designed adaptive threshold criteria, effective MC feature analysis, and false positive reduction techniques, leads to a significant improvement in the detection of clustered MCs in DBT.« less

  16. A quasiparticle-based multi-reference coupled-cluster method.

    PubMed

    Rolik, Zoltán; Kállay, Mihály

    2014-10-07

    The purpose of this paper is to introduce a quasiparticle-based multi-reference coupled-cluster (MRCC) approach. The quasiparticles are introduced via a unitary transformation which allows us to represent a complete active space reference function and other elements of an orthonormal multi-reference (MR) basis in a determinant-like form. The quasiparticle creation and annihilation operators satisfy the fermion anti-commutation relations. On the basis of these quasiparticles, a generalization of the normal-ordered operator products for the MR case can be introduced as an alternative to the approach of Mukherjee and Kutzelnigg [Recent Prog. Many-Body Theor. 4, 127 (1995); Mukherjee and Kutzelnigg, J. Chem. Phys. 107, 432 (1997)]. Based on the new normal ordering any quasiparticle-based theory can be formulated using the well-known diagram techniques. Beyond the general quasiparticle framework we also present a possible realization of the unitary transformation. The suggested transformation has an exponential form where the parameters, holding exclusively active indices, are defined in a form similar to the wave operator of the unitary coupled-cluster approach. The definition of our quasiparticle-based MRCC approach strictly follows the form of the single-reference coupled-cluster method and retains several of its beneficial properties. Test results for small systems are presented using a pilot implementation of the new approach and compared to those obtained by other MR methods.

  17. Intelligent screening of electrofusion-polyethylene joints based on a thermal NDT method

    NASA Astrophysics Data System (ADS)

    Doaei, Marjan; Tavallali, M. Sadegh

    2018-05-01

    The combinations of infrared thermal images and artificial intelligence methods have opened new avenues for pushing the boundaries of available testing methods. Hence, in the current study, a novel thermal non-destructive testing method for polyethylene electrofusion joints was combined with k-means clustering algorithms as an intelligent screening tool. The experiments focused on ovality of pipes in the coupler, as well as misalignment of pipes-couplers in 25 mm diameter joints. The temperature responses of each joint to an internal heat pulse were recorded by an IR thermal camera, and further processed to identify the faulty joints. The results represented clustering accuracy of 92%, as well as more than 90% abnormality detection capabilities.

  18. Development of small scale cluster computer for numerical analysis

    NASA Astrophysics Data System (ADS)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  19. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  20. Spatial clustering of childhood leukaemia in Switzerland: A nationwide study.

    PubMed

    Konstantinoudis, Garyfallos; Kreis, Christian; Ammann, Roland A; Niggli, Felix; Kuehni, Claudia E; Spycher, Ben D

    2017-10-01

    The aetiology of childhood leukaemia remains largely unknown. Several hypotheses involve environmental exposures that could implicate spatial clustering of cases. The evidence from previous clustering studies is inconclusive. Most of them used areal data and thus had limited spatial resolution. We investigated whether childhood leukaemia tends to cluster in space using exact geocodes of place of residence both at the time of birth or diagnosis. We included 1,871 leukaemia cases diagnosed between 1985 and 2015 at age 0-15 years from the Swiss Childhood Cancer Registry. For each case, we randomly sampled 10 age and sex matched controls from national censuses closest in time. We used the difference of k-functions, Cuzick-Edwards' test and Tango's index for point data to assess spatial clustering and Kulldorff's circular scan to detect clusters. We separately investigated acute lymphoid leukaemia (ALL), acute myeloid leukaemia (AML), different age groups at diagnosis (0-4, 5-15 years) and adjusted for multiple testing. After adjusting for multiple testing, we found no evidence of spatial clustering of childhood leukaemia neither around time of birth (p = 0.52) nor diagnosis (p = 0.51). Individual tests indicated spatial clustering for leukaemia diagnosed at age 5-15 years, p k-functions = 0.05 and p Cuzick-Edwards' = 0.04 and a cluster of ALL cases diagnosed at age 0-4 years in a small rural area (p = 0.05). This study provides little evidence of spatial clustering of childhood leukaemia in Switzerland and highlights the importance of accounting for multiple testing in clustering studies. © 2017 UICC.

  1. Fuzzy Set Methods for Object Recognition in Space Applications

    NASA Technical Reports Server (NTRS)

    Keller, James M. (Editor)

    1992-01-01

    Progress on the following four tasks is described: (1) fuzzy set based decision methodologies; (2) membership calculation; (3) clustering methods (including derivation of pose estimation parameters), and (4) acquisition of images and testing of algorithms.

  2. Measuring the scatter in the cluster optical richness-mass relation with machine learning

    NASA Astrophysics Data System (ADS)

    Boada, Steven Alvaro

    The distribution of massive clusters of galaxies depends strongly on the total cosmic mass density, the mass variance, and the dark energy equation of state. As such, measures of galaxy clusters can provide constraints on these parameters and even test models of gravity, but only if observations of clusters can lead to accurate estimates of their total masses. Here, we carry out a study to investigate the ability of a blind spectroscopic survey to recover accurate galaxy cluster masses through their line-of- sight velocity dispersions (LOSVD) using probability based and machine learning methods. We focus on the Hobby Eberly Telescope Dark Energy Experiment (HETDEX), which will employ new Visible Integral-Field Replicable Unit Spectrographs (VIRUS), over 420 degree2 on the sky with a 1/4.5 fill factor. VIRUS covers the blue/optical portion of the spectrum (3500 - 5500 A), allowing surveys to measure redshifts for a large sample of galaxies out to z < 0.5 based on their absorption or emission (e.g., [O II], Mg II, Ne V) features. We use a detailed mock galaxy catalog from a semi-analytic model to simulate surveys observed with VIRUS, including: (1) Survey, a blind, HETDEX-like survey with an incomplete but uniform spectroscopic selection function; and (2) Targeted, a survey which targets clusters directly, obtaining spectra of all galaxies in a VIRUS-sized field. For both surveys, we include realistic uncertainties from galaxy magnitude and line-flux limits. We benchmark both surveys against spectroscopic observations with perfect" knowledge of galaxy line-of-sight velocities. With Survey observations, we can recover cluster masses to ˜ 0.1 dex which can be further improved to < 0.1 dex with Targeted observations. This level of cluster mass recovery provides important measurements of the intrinsic scatter in the optical richness-cluster mass relation, and enables constraints on the key cosmological parameter, sigma 8, to < 20%. As a demonstration of the methods developed previously, we present a pilot survey with integral field spectroscopy of ten galaxy clusters optically selected from the Sloan Digital Sky Survey's DR8 at z = 0.2 - 0.3. Eight of the clusters are rich (lambda > 60) systems with total inferred masses (1.58 -17.37) x1014 M (M 200c), and two are poor (lambda < 15) systems with inferred total masses ˜ 0.5 x 1014 M? (M200c ). We use the Mitchell Spectrograph, (formerly the VIRUS-P spectrograph, a prototype of the HETDEX VIRUS instrument) located on the McDonald Observatory 2.7m telescope, to measure spectroscopic redshifts and line-of-sight velocities of the galaxies in and around each cluster, determine cluster membership and derive LOSVDs. We test both a LOSVD-cluster mass scaling relation and a machine learning based approach to infer total cluster mass. After comparing the cluster mass estimates to the literature, we use these independent cluster mass measurements to estimate the absolute cluster mass scale, and intrinsic scatter in the optical richness-mass relationship. We measure the intrinsic scatter in richness at fixed cluster mass to be sigmaM/lambda = 0.27 +/- 0.07 dex in excellent agreement with previous estimates of sigmaM/lambda ˜ 0.2 - 0.3 dex. We discuss the importance of the data used to train the machine learning methods and suggest various strategies to import the accuracy of the bias (offset) and scatter in the optical richness-cluster mass relation. This demonstrates the power of blind spectroscopic surveys such as HETDEX to provide robust cluster mass estimates which can aid in the determination of cosmological parameters and help to calibrate the observable-mass relation for future photometric large area-sky surveys.

  3. Reformulation of Traditional Chamomile Oil: Quality Controls and Fingerprint Presentation Based on Cluster Analysis of Attenuated Total Reflectance–Infrared Spectral Data

    PubMed Central

    Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali

    2017-01-01

    Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance–infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties. PMID:28585466

  4. Reformulation of Traditional Chamomile Oil: Quality Controls and Fingerprint Presentation Based on Cluster Analysis of Attenuated Total Reflectance-Infrared Spectral Data.

    PubMed

    Zargaran, Arman; Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali

    2017-10-01

    Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance-infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties.

  5. Multi-Nozzle Base Flow Model in the 10- by 10-Foot Supersonic Wind Tunnel

    NASA Image and Video Library

    1964-02-21

    Researchers check the setup of a multi-nozzle base flow model in the 10- by 10-Foot Supersonic Wind Tunnel at the National Aeronautics and Space Administration (NASA) Lewis Research Center. NASA researchers were struggling to understand the complex flow phenomena resulting from the use of multiple rocket engines. Robert Wasko and Theodore Cover of the Advanced Development and Evaluation Division’s analysis and operations sections conducted a set of tests in the 10- by 10 tunnel to further understand the flow issues. The Lewis researchers studied four and five-nozzle configurations in the 10- by 10 at simulated altitudes from 60,000 to 200,000 feet. The nozzles were gimbaled during some of the test runs to simulate steering. The flow field for the four-nozzle clusters was surveyed in the center and the lateral areas between the nozzles, whereas the five-nozzle cluster was surveyed in the lateral area only.

  6. SOTXTSTREAM: Density-based self-organizing clustering of text streams.

    PubMed

    Bryant, Avory C; Cios, Krzysztof J

    2017-01-01

    A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.

  7. Night-time neuronal activation of Cluster N in a day- and night-migrating songbird.

    PubMed

    Zapka, Manuela; Heyers, Dominik; Liedvogel, Miriam; Jarvis, Erich D; Mouritsen, Henrik

    2010-08-01

    Magnetic compass orientation in a night-migratory songbird requires that Cluster N, a cluster of forebrain regions, is functional. Cluster N, which receives input from the eyes via the thalamofugal pathway, shows high neuronal activity in night-migrants performing magnetic compass-guided behaviour at night, whereas no activation is observed during the day, and covering up the birds' eyes strongly reduces neuronal activation. These findings suggest that Cluster N processes light-dependent magnetic compass information in night-migrating songbirds. The aim of this study was to test if Cluster N is active during daytime migration. We used behavioural molecular mapping based on ZENK activation to investigate if Cluster N is active in the meadow pipit (Anthus pratensis), a day- and night-migratory species. We found that Cluster N of meadow pipits shows high neuronal activity under dim-light at night, but not under full room-light conditions during the day. These data suggest that, in day- and night-migratory meadow pipits, the light-dependent magnetic compass, which requires an active Cluster N, may only be used during night-time, whereas another magnetosensory mechanism and/or other reference system(s), like the sun or polarized light, may be used as primary orientation cues during the day.

  8. Determining SAFOD area microearthquake locations solely with the Pilot Hole seismic array data

    NASA Astrophysics Data System (ADS)

    Oye, Volker; Chavarria, J. Andres; Malin, Peter E.

    2004-05-01

    In August 2002, an array of 32 three-component geophones was installed in the San Andreas Fault Observatory at Depth (SAFOD) Pilot Hole (PH) at Parkfield, CA. As an independent test of surface-observation-based microearthquake locations, we have located such events using only data recorded on the PH array. We then compared these locations with locations from a combined set of PH and Parkfield High Resolution Seismic Network (HRSN) observations. We determined the uncertainties in the locations as they relate to errors in the travel time picks and the velocity model by the bootstrap method. Based on the PH and combined locations, we find that the ``C2'' cluster to the northeast of the PH has the smallest location uncertainties. Events in this cluster also have the most similar waveforms and largest magnitudes. This confirms earlier suggestions that the C2 cluster is a promising target for the SAFOD Main Hole.

  9. Identification of Clusters that Condition Resistance to Anthracnose in the Common Bean Differential Cultivars AB136 and MDRK.

    PubMed

    Campa, Ana; Trabanco, Noemí; Ferreira, Juan José

    2017-12-01

    The correct identification of the anthracnose resistance systems present in the common bean cultivars AB136 and MDRK is important because both are included in the set of 12 differential cultivars proposed for use in classifying the races of the anthracnose causal agent, Colletrotrichum lindemuthianum. In this work, the responses against seven C. lindemuthianum races were analyzed in a recombinant inbred line population derived from the cross AB136 × MDRK. A genetic linkage map of 100 molecular markers distributed across the 11 bean chromosomes was developed in this population to locate the gene or genes conferring resistance against each race, based on linkage analyses and χ 2 tests of independence. The identified anthracnose resistance genes were organized in clusters. Two clusters were found in AB136: one located on linkage group Pv07, which corresponds to the anthracnose resistance cluster Co-5, and the other located at the end of linkage group Pv11, which corresponds to the Co-2 cluster. The presence of resistance genes at the Co-5 cluster in AB136 was validated through an allelism test conducted in the F 2 population TU × AB136. The presence of resistance genes at the Co-2 cluster in AB136 was validated through genetic dissection using the F 2:3 population ABM3 × MDRK, in which it was directly mapped to a genomic position between 46.01 and 47.77 Mb of chromosome Pv11. In MDRK, two independent clusters were identified: one located on linkage group Pv01, corresponding to the Co-1 cluster, and the second located on LG Pv04, corresponding to the Co-3 cluster. This report enhances the understanding of the race-specific Phaseolus vulgaris-C. lindemuthianum interactions and will be useful in breeding programs.

  10. Spin-orbit splitted excited states using explicitly-correlated equation-of-motion coupled-cluster singles and doubles eigenvectors

    NASA Astrophysics Data System (ADS)

    Bokhan, Denis; Trubnikov, Dmitrii N.; Perera, Ajith; Bartlett, Rodney J.

    2018-04-01

    An explicitly-correlated method of calculation of excited states with spin-orbit couplings, has been formulated and implemented. Developed approach utilizes left and right eigenvectors of equation-of-motion coupled-cluster model, which is based on the linearly approximated explicitly correlated coupled-cluster singles and doubles [CCSD(F12)] method. The spin-orbit interactions are introduced by using the spin-orbit mean field (SOMF) approximation of the Breit-Pauli Hamiltonian. Numerical tests for several atoms and molecules show good agreement between explicitly-correlated results and the corresponding values, calculated in complete basis set limit (CBS); the highly-accurate excitation energies can be obtained already at triple- ζ level.

  11. An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

    PubMed

    van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

    2009-06-01

    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.

  12. Tropospheric Ozonesonde Profiles at Long-term U.S. Monitoring Sites: 1. A Climatology Based on Self-Organizing Maps

    NASA Technical Reports Server (NTRS)

    Stauffer, Ryan M.; Thompson, Anne M.; Young, George S.

    2016-01-01

    Sonde-based climatologies of tropospheric ozone (O3) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O3 climatologies average measurements by latitude or region, and season. A recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found that clusters of O3 mixing ratio profiles are an excellent way to capture O3variability and link meteorological influences to O3 profiles. Clusters correspond to distinct meteorological conditions, e.g., convection, subsidence, cloud cover, and transported pollution. Here the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; and Wallops Island, VA) with4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3 3 SOM (nine clusters). Ateach site, SOM clusters together O3 profiles with similar tropopause height, 500 hPa height temperature, and amount of tropospheric and total column O3. Cluster means are compared to monthly O3 climatologies.For all four sites, near-tropopause O3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O3 mixing ratio in three clusters that contain 1316 of profiles, mostly in winter and spring.Large midtropospheric deviations from monthly means (6 ppbv, +710 ppbv O3 at 6 km) are found in two of the most populated clusters (combined 3639 of profiles). These two clusters contain distinctly polluted(summer) and clean O3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O3 averages are often poor representations of U.S. O3 profile statistics.

  13. Tropospheric ozonesonde profiles at long-term U.S. monitoring sites: 1. A climatology based on self-organizing maps

    PubMed Central

    Stauffer, Ryan M.; Thompson, Anne M.; Young, George S.

    2018-01-01

    Sonde-based climatologies of tropospheric ozone (O3) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O3 climatologies average measurements by latitude or region, and season. Recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found clusters of O3 mixing ratio profiles are an excellent way to capture O3 variability and link meteorological influences to O3 profiles. Clusters correspond to distinct meteorological conditions, e.g. convection, subsidence, cloud cover, and transported pollution. Here, the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; Wallops Island, VA) with 4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3×3 SOM (nine clusters). At each site, SOM clusters together O3 profiles with similar tropopause height, 500 hPa height/temperature, and amount of tropospheric and total column O3. Cluster means are compared to monthly O3 climatologies. For all four sites, near-tropopause O3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O3 mixing ratio in three clusters that contain 13 – 16% of profiles, mostly in winter and spring. Large mid-tropospheric deviations from monthly means (−6 ppbv, +7 – 10 ppbv O3 at 6 km) are found in two of the most populated clusters (combined 36 – 39% of profiles). These two clusters contain distinctly polluted (summer) and clean O3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O3 averages are often poor representations of U.S. O3 profile statistics. PMID:29619288

  14. Tropospheric ozonesonde profiles at long-term U.S. monitoring sites: 1. A climatology based on self-organizing maps.

    PubMed

    Stauffer, Ryan M; Thompson, Anne M; Young, George S

    2016-02-16

    Sonde-based climatologies of tropospheric ozone (O 3 ) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O 3 climatologies average measurements by latitude or region, and season. Recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found clusters of O 3 mixing ratio profiles are an excellent way to capture O 3 variability and link meteorological influences to O 3 profiles. Clusters correspond to distinct meteorological conditions, e.g. convection, subsidence, cloud cover, and transported pollution. Here, the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; Wallops Island, VA) with 4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3×3 SOM (nine clusters). At each site, SOM clusters together O 3 profiles with similar tropopause height, 500 hPa height/temperature, and amount of tropospheric and total column O 3 . Cluster means are compared to monthly O 3 climatologies. For all four sites, near-tropopause O 3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O 3 mixing ratio in three clusters that contain 13 - 16% of profiles, mostly in winter and spring. Large mid-tropospheric deviations from monthly means (-6 ppbv, +7 - 10 ppbv O 3 at 6 km) are found in two of the most populated clusters (combined 36 - 39% of profiles). These two clusters contain distinctly polluted (summer) and clean O 3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O 3 averages are often poor representations of U.S. O 3 profile statistics.

  15. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis.

    PubMed

    Yun, Younghee; Jung, Wonmo; Kim, Hyunho; Jang, Bo-Hyoung; Kim, Min-Hee; Noh, Jiseong; Ko, Seong-Gyu; Choi, Inhwa

    2017-08-01

    Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease. Copyright © 2017. Published by Elsevier Ltd.

  16. The Atacama Cosmology Telescope: The Two-season ACTPol Sunyaev–Zel’dovich Effect Selected Cluster Catalog

    NASA Astrophysics Data System (ADS)

    Hilton, Matt; Hasselfield, Matthew; Sifón, Cristóbal; Battaglia, Nicholas; Aiola, Simone; Bharadwaj, V.; Bond, J. Richard; Choi, Steve K.; Crichton, Devin; Datta, Rahul; Devlin, Mark J.; Dunkley, Joanna; Dünner, Rolando; Gallardo, Patricio A.; Gralla, Megan; Hincks, Adam D.; Ho, Shuay-Pwu P.; Hubmayr, Johannes; Huffenberger, Kevin M.; Hughes, John P.; Koopman, Brian J.; Kosowsky, Arthur; Louis, Thibaut; Madhavacheril, Mathew S.; Marriage, Tobias A.; Maurin, Loïc; McMahon, Jeff; Miyatake, Hironao; Moodley, Kavilan; Næss, Sigurd; Nati, Federico; Newburgh, Laura; Niemack, Michael D.; Oguri, Masamune; Page, Lyman A.; Partridge, Bruce; Schmitt, Benjamin L.; Sievers, Jon; Spergel, David N.; Staggs, Suzanne T.; Trac, Hy; van Engelen, Alexander; Vavagiakis, Eve M.; Wollack, Edward J.

    2018-03-01

    We present a catalog of 182 galaxy clusters detected through the Sunyaev–Zel’dovich (SZ) effect by the Atacama Cosmology Telescope in a contiguous 987.5 deg2 field. The clusters were detected as SZ decrements by applying a matched filter to 148 GHz maps that combine the original ACT equatorial survey with data from the first two observing seasons using the ACTPol receiver. Optical/IR confirmation and redshift measurements come from a combination of large public surveys and our own follow-up observations. Where necessary, we measured photometric redshifts for clusters using a pipeline that achieves accuracy Δz/(1 + z) = 0.015 when tested on Sloan Digital Sky Survey data. Under the assumption that clusters can be described by the so-called universal pressure profile (UPP) and its associated mass scaling law, the full signal-to-noise ratio > 4 sample spans the mass range 1.6< {M}500{{c}}UPP}/{10}14 {M}ȯ < 9.1, with median {M}500{{c}}UPP}=3.1× {10}14 {M}ȯ . The sample covers the redshift range 0.1 < z < 1.4 (median z = 0.49), and 28 clusters are new discoveries (median z = 0.80). We compare our catalog with other overlapping cluster samples selected using the SZ, optical, and X-ray wavelengths. We find that the ratio of the UPP-based SZ mass to richness-based weak-lensing mass is < {M}500{{c}}UPP}> /< {M}500{{c}}λ {WL}> =0.68+/- 0.11. After applying this calibration, the mass distribution for clusters with M 500c > 4 × 1014 M ⊙ is consistent with the number of such clusters found in the South Pole Telescope SZ survey.

  17. Distribution of sea anemones (Cnidaria, Actiniaria) in Korea analyzed by environmental clustering

    USGS Publications Warehouse

    Cha, H.-R.; Buddemeier, R.W.; Fautin, D.G.; Sandhei, P.

    2004-01-01

    Using environmental data and the geospatial clustering tools LOICZView and DISCO, we empirically tested the postulated existence and boundaries of four biogeographic regions in the southern part of the Korean peninsula. Environmental variables used included wind speed, sea surface temperature (SST), salinity, tidal amplitude, and the chlorophyll spectral signal. Our analysis confirmed the existence of four biogeographic regions, but the details of the borders between them differ from those previously postulated. Specimen-level distribution records of intertidal sea anemones were mapped; their distribution relative to the environmental data supported the importance of the environmental parameters we selected in defining suitable habitats. From the geographic coincidence between anemone distribution and the clusters based on environmental variables, we infer that geospatial clustering has the power to delimit ranges for marine organisms within relatively small geographical areas.

  18. Development of Entry-Level Competence Tests: A Strategy for Evaluation of Vocational Education Training Systems

    ERIC Educational Resources Information Center

    Schutte, Marc; Spottl, Georg

    2011-01-01

    Developing countries such as Malaysia and Oman have recently established occupational standards based on core work processes (functional clusters of work objects, activities and performance requirements), to which competencies (performance determinants) can be linked. While the development of work-process-based occupational standards is supposed…

  19. Preferences mapping of household biodigester in Bandung

    NASA Astrophysics Data System (ADS)

    Humaira, S.; Rianawati, E.; Sagala, S.; Sasongko, M. A.

    2018-05-01

    Bandung city government implemented household biodigester grants in 2015 and 2016. Unfortunately, there are some household biodigesters that still functioning well but not in use. Therefore, this study is an effort to improve the acceptance and usage rate of household biodigesters in Bandung. The purpose of this study is to know citizen’s preference when it comes to household biodigester. To get the picture, we conducted survey through online questionnaire based on eight dimension of quality defined by Garvin (1987) as basis to construct factors that might be favoured by current and potential users of household biodigesters. Based on result of cluster analysis, three clusters with different preferences were interpreted and profiled through Welch’s ANOVA and Games-Howell Test. This study reveals that the cluster with the largest number of members shows reliability and features as the key to determining current and potential user’s preference. This study suggests the developer of household biodigester to choose cluster 1 and prioritize the aspect of reliability and feature within the development of the next household biodigester product to get higher level of public acceptance.

  20. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms I: Revisiting Cluster-Based Inferences.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K

    2018-02-01

    In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.

  1. Detection of neonatal unit clusters of Candida parapsilosis fungaemia by microsatellite genotyping: Results from laboratory-based sentinel surveillance, South Africa, 2009-2010.

    PubMed

    Magobo, Rindidzani E; Naicker, Serisha D; Wadula, Jeannette; Nchabeleng, Maphoshane; Coovadia, Yacoob; Hoosen, Anwar; Lockhart, Shawn R; Govender, Nelesh P

    2017-05-01

    Neonatal candidaemia is a common, deadly and costly hospital-associated disease. To determine the genetic diversity of Candida parapsilosis causing fungaemia in South African neonatal intensive care units (NICUs). From February 2009 through to August 2010, cases of candidaemia were reported through laboratory-based surveillance. C. parapsilosis isolates from neonatal cases were submitted for identification by internal transcribed spacer (ITS) region sequencing, antifungal susceptibility testing and microsatellite genotyping. Cluster analysis was performed using Unweighted Pair Group Method with Arithmetic Mean (UPGMA). Of 1671 cases with a viable Candida isolate, 393 (24%) occurred among neonates. Isolates from 143 neonatal cases were confirmed as C. parapsilosis sensu stricto. Many isolates were resistant to fluconazole (77/143; 54%) and voriconazole (20/143; 14%). Of 79 closely-related genotypes, 18 were represented by ≥2 isolates; 61 genotypes had a single isolate each. Seven clusters, comprised of 82 isolates, were identified at five hospitals in three provinces. Isolates belonging to certain clusters were significantly more likely to be fluconazole resistant: all cluster 7 isolates and the majority of cluster 4 (78%), 5 (89%) and 6 (67%) isolates (P<.001). Candida parapsilosis-associated candidaemia in public-sector NICUs was caused by closely related genotypes and there was molecular evidence of undetected outbreaks as well as intra-hospital transmission. © 2017 Blackwell Verlag GmbH.

  2. Comparison of cytology, HPV DNA testing and HPV 16/18 genotyping alone or combined targeting to the more balanced methodology for cervical cancer screening.

    PubMed

    Chatzistamatiou, Kimon; Moysiadis, Theodoros; Moschaki, Viktoria; Panteleris, Nikolaos; Agorastos, Theodoros

    2016-07-01

    The objective of the present study was to identify the most effective cervical cancer screening algorithm incorporating different combinations of cytology, HPV testing and genotyping. Women 25-55years old recruited for the "HERMES" (HEllenic Real life Multicentric cErvical Screening) study were screened in terms of cytology and high-risk (hr) HPV testing with HPV 16/18 genotyping. Women positive for cytology or/and hrHPV were referred for colposcopy, biopsy and treatment. Ten screening algorithms based on different combinations of cytology, HPV testing and HPV 16/18 genotyping were investigated in terms of diagnostic accuracy. Three clusters of algorithms were formed according to the balance between effectiveness and harm caused by screening. The cluster showing the best balance included two algorithms based on co-testing and two based on HPV primary screening with HPV 16/18 genotyping. Among these, hrHPV testing with HPV 16/18 genotyping and reflex cytology (atypical squamous cells of undetermined significance - ASCUS threshold) presented the optimal combination of sensitivity (82.9%) and specificity relative to cytology alone (0.99) with 1.26 false positive rate relative to cytology alone. HPV testing with HPV 16/18 genotyping, referring HPV 16/18 positive women directly to colposcopy, and hrHPV (non 16/18) positive women to reflex cytology (ASCUS threshold), as a triage method to colposcopy, reflects the best equilibrium between screening effectiveness and harm. Algorithms, based on cytology as initial screening method, on co-testing or HPV primary without genotyping, and on HPV primary with genotyping but without cytology triage, are not supported according to the present analysis. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Space Launch System Base Heating Test: Experimental Operations & Results

    NASA Technical Reports Server (NTRS)

    Dufrene, Aaron; Mehta, Manish; MacLean, Matthew; Seaford, Mark; Holden, Michael

    2016-01-01

    NASA's Space Launch System (SLS) uses four clustered liquid rocket engines along with two solid rocket boosters. The interaction between all six rocket exhaust plumes will produce a complex and severe thermal environment in the base of the vehicle. This work focuses on a recent 2% scale, hot-fire SLS base heating test. These base heating tests are short-duration tests executed with chamber pressures near the full-scale values with gaseous hydrogen/oxygen engines and RSRMV analogous solid propellant motors. The LENS II shock tunnel/Ludwieg tube tunnel was used at or near flight duplicated conditions up to Mach 5. Model development was based on the Space Shuttle base heating tests with several improvements including doubling of the maximum chamber pressures and duplication of freestream conditions. Test methodology and conditions are presented, and base heating results from 76 runs are reported in non-dimensional form. Regions of high heating are identified and comparisons of various configuration and conditions are highlighted. Base pressure and radiometer results are also reported.

  4. The ALICE Software Release Validation cluster

    NASA Astrophysics Data System (ADS)

    Berzano, D.; Krzewicki, M.

    2015-12-01

    One of the most important steps of software lifecycle is Quality Assurance: this process comprehends both automatic tests and manual reviews, and all of them must pass successfully before the software is approved for production. Some tests, such as source code static analysis, are executed on a single dedicated service: in High Energy Physics, a full simulation and reconstruction chain on a distributed computing environment, backed with a sample “golden” dataset, is also necessary for the quality sign off. The ALICE experiment uses dedicated and virtualized computing infrastructures for the Release Validation in order not to taint the production environment (i.e. CVMFS and the Grid) with non-validated software and validation jobs: the ALICE Release Validation cluster is a disposable virtual cluster appliance based on CernVM and the Virtual Analysis Facility, capable of deploying on demand, and with a single command, a dedicated virtual HTCondor cluster with an automatically scalable number of virtual workers on any cloud supporting the standard EC2 interface. Input and output data are externally stored on EOS, and a dedicated CVMFS service is used to provide the software to be validated. We will show how the Release Validation Cluster deployment and disposal are completely transparent for the Release Manager, who simply triggers the validation from the ALICE build system's web interface. CernVM 3, based entirely on CVMFS, permits to boot any snapshot of the operating system in time: we will show how this allows us to certify each ALICE software release for an exact CernVM snapshot, addressing the problem of Long Term Data Preservation by ensuring a consistent environment for software execution and data reprocessing in the future.

  5. Testing use of payers to facilitate evidence-based practice adoption: protocol for a cluster-randomized trial

    PubMed Central

    2013-01-01

    Background More effective methods are needed to implement evidence-based findings into practice. The Advancing Recovery Framework offers a multi-level approach to evidence-based practice implementation by aligning purchasing and regulatory policies at the payer level with organizational change strategies at the organizational level. Methods The Advancing Recovery Buprenorphine Implementation Study is a cluster-randomized controlled trial designed to increase use of the evidence-based practice buprenorphine medication to treat opiate addiction. Ohio Alcohol, Drug Addiction, and Mental Health Services Boards (ADAMHS), who are payers, and their addiction treatment organizations were recruited for a trial to assess the effects of payer and treatment organization changes (using the Advancing Recovery Framework) versus treatment organization changes alone on the use of buprenorphine. A matched-pair randomization, based on county characteristics, was applied, resulting in seven county ADAMHS boards and twenty-five treatment organizations in each arm. Opioid dependent patients are nested within cluster (treatment organization), and treatment organization clusters are nested within ADAMHS county board. The primary outcome is the percentage of individuals with an opioid dependence diagnosis who use buprenorphine during the 24-month intervention period and the 12-month sustainability period. The trial is currently in the baseline data collection stage. Discussion Although addiction treatment providers are under increasing pressure to implement evidence-based practices that have been proven to improve patient outcomes, adoption of these practices lags, compared to other areas of healthcare. Reasons frequently cited for the slow adoption of EBPs in addiction treatment include, regulatory issues, staff, or client resistance and lack of resources. Yet the way addiction treatment is funded, the payer’s role—has not received a lot of attention in research on EBP adoption. This research is unique because it investigates the role of payers in evidence-based practice implementation using a randomized controlled design instead of case examples. The testing of the Advancing Recovery Framework is designed to broaden the understanding of the impact payers have on evidence-based practice (EBP) adoption. Trial registration http://NCT01702142 (ClinicalTrials.gov registry, USA) PMID:23663749

  6. Classification of patients based on their evaluation of hospital outcomes: cluster analysis following a national survey in Norway

    PubMed Central

    2013-01-01

    Background A general trend towards positive patient-reported evaluations of hospitals could be taken as a sign that most patients form a homogeneous, reasonably pleased group, and consequently that there is little need for quality improvement. The objective of this study was to explore this assumption by identifying and statistically validating clusters of patients based on their evaluation of outcomes related to overall satisfaction, malpractice and benefit of treatment. Methods Data were collected using a national patient-experience survey of 61 hospitals in the 4 health regions in Norway during spring 2011. Postal questionnaires were mailed to 23,420 patients after their discharge from hospital. Cluster analysis was performed to identify response clusters of patients, based on their responses to single items about overall patient satisfaction, benefit of treatment and perception of malpractice. Results Cluster analysis identified six response groups, including one cluster with systematically poorer evaluation across outcomes (18.5% of patients) and one small outlier group (5.3%) with very poor scores across all outcomes. One-Way ANOVA with post-hoc tests showed that most differences between the six response groups on the three outcome items were significant. The response groups were significantly associated with nine patient-experience indicators (p < 0.001), and all groups were significantly different from each of the other groups on a majority of the patient-experience indicators. Clusters were significantly associated with age, education, self-perceived health, gender, and the degree to write open comments in the questionnaire. Conclusions The study identified five response clusters with distinct patient-reported outcome scores, in addition to a heterogeneous outlier group with very poor scores across all outcomes. The outlier group and the cluster with systematically poorer evaluation across outcomes comprised almost one-quarter of all patients, clearly demonstrating the need to tailor quality initiatives and improve patient-perceived quality in hospitals. More research on patient clustering in patient evaluation is needed, as well as standardization of methodology to increase comparability across studies. PMID:23433450

  7. Peripheral neuropathic pain: a mechanism-related organizing principle based on sensory profiles

    PubMed Central

    Baron, Ralf; Maier, Christoph; Attal, Nadine; Binder, Andreas; Bouhassira, Didier; Cruccu, Giorgio; Finnerup, Nanna B.; Haanpää, Maija; Hansson, Per; Hüllemann, Philipp; Jensen, Troels S.; Freynhagen, Rainer; Kennedy, Jeffrey D.; Magerl, Walter; Mainka, Tina; Reimer, Maren; Rice, Andrew S.C.; Segerdahl, Märta; Serra, Jordi; Sindrup, Sören; Sommer, Claudia; Tölle, Thomas; Vollert, Jan; Treede, Rolf-Detlef

    2016-01-01

    Abstract Patients with neuropathic pain are heterogeneous in etiology, pathophysiology, and clinical appearance. They exhibit a variety of pain-related sensory symptoms and signs (sensory profile). Different sensory profiles might indicate different classes of neurobiological mechanisms, and hence subgroups with different sensory profiles might respond differently to treatment. The aim of the investigation was to identify subgroups in a large sample of patients with neuropathic pain using hypothesis-free statistical methods on the database of 3 large multinational research networks (German Research Network on Neuropathic Pain (DFNS), IMI-Europain, and Neuropain). Standardized quantitative sensory testing was used in 902 (test cohort) and 233 (validation cohort) patients with peripheral neuropathic pain of different etiologies. For subgrouping, we performed a cluster analysis using 13 quantitative sensory testing parameters. Three distinct subgroups with characteristic sensory profiles were identified and replicated. Cluster 1 (sensory loss, 42%) showed a loss of small and large fiber function in combination with paradoxical heat sensations. Cluster 2 (thermal hyperalgesia, 33%) was characterized by preserved sensory functions in combination with heat and cold hyperalgesia and mild dynamic mechanical allodynia. Cluster 3 (mechanical hyperalgesia, 24%) was characterized by a loss of small fiber function in combination with pinprick hyperalgesia and dynamic mechanical allodynia. All clusters occurred across etiologies but frequencies differed. We present a new approach of subgrouping patients with peripheral neuropathic pain of different etiologies according to intrinsic sensory profiles. These 3 profiles may be related to pathophysiological mechanisms and may be useful in clinical trial design to enrich the study population for treatment responders. PMID:27893485

  8. Peripheral neuropathic pain: a mechanism-related organizing principle based on sensory profiles.

    PubMed

    Baron, Ralf; Maier, Christoph; Attal, Nadine; Binder, Andreas; Bouhassira, Didier; Cruccu, Giorgio; Finnerup, Nanna B; Haanpää, Maija; Hansson, Per; Hüllemann, Philipp; Jensen, Troels S; Freynhagen, Rainer; Kennedy, Jeffrey D; Magerl, Walter; Mainka, Tina; Reimer, Maren; Rice, Andrew S C; Segerdahl, Märta; Serra, Jordi; Sindrup, Sören; Sommer, Claudia; Tölle, Thomas; Vollert, Jan; Treede, Rolf-Detlef

    2017-02-01

    Patients with neuropathic pain are heterogeneous in etiology, pathophysiology, and clinical appearance. They exhibit a variety of pain-related sensory symptoms and signs (sensory profile). Different sensory profiles might indicate different classes of neurobiological mechanisms, and hence subgroups with different sensory profiles might respond differently to treatment. The aim of the investigation was to identify subgroups in a large sample of patients with neuropathic pain using hypothesis-free statistical methods on the database of 3 large multinational research networks (German Research Network on Neuropathic Pain (DFNS), IMI-Europain, and Neuropain). Standardized quantitative sensory testing was used in 902 (test cohort) and 233 (validation cohort) patients with peripheral neuropathic pain of different etiologies. For subgrouping, we performed a cluster analysis using 13 quantitative sensory testing parameters. Three distinct subgroups with characteristic sensory profiles were identified and replicated. Cluster 1 (sensory loss, 42%) showed a loss of small and large fiber function in combination with paradoxical heat sensations. Cluster 2 (thermal hyperalgesia, 33%) was characterized by preserved sensory functions in combination with heat and cold hyperalgesia and mild dynamic mechanical allodynia. Cluster 3 (mechanical hyperalgesia, 24%) was characterized by a loss of small fiber function in combination with pinprick hyperalgesia and dynamic mechanical allodynia. All clusters occurred across etiologies but frequencies differed. We present a new approach of subgrouping patients with peripheral neuropathic pain of different etiologies according to intrinsic sensory profiles. These 3 profiles may be related to pathophysiological mechanisms and may be useful in clinical trial design to enrich the study population for treatment responders.

  9. A statistically compiled test battery for feasible evaluation of knee function after rupture of the Anterior Cruciate Ligament - derived from long-term follow-up data.

    PubMed

    Schelin, Lina; Tengman, Eva; Ryden, Patrik; Häger, Charlotte

    2017-01-01

    Clinical test batteries for evaluation of knee function after injury to the Anterior Cruciate Ligament (ACL) should be valid and feasible, while reliably capturing the outcome of rehabilitation. There is currently a lack of consensus as to which of the many available assessment tools for knee function that should be included. The present aim was to use a statistical approach to investigate the contribution of frequently used tests to avoid redundancy, and filter them down to a proposed comprehensive and yet feasible test battery for long-term evaluation after ACL injury. In total 48 outcome variables related to knee function, all potentially relevant for a long-term follow-up, were included from a cross-sectional study where 70 ACL-injured (17-28 years post injury) individuals were compared to 33 controls. Cluster analysis and logistic regression were used to group variables and identify an optimal test battery, from which a summarized estimator of knee function representing various functional aspects was derived. As expected, several variables were strongly correlated, and the variables also fell into logical clusters with higher within-correlation (max ρ = 0.61) than between clusters (max ρ = 0.19). An extracted test battery with just four variables assessing one-leg balance, isokinetic knee extension strength and hop performance (one-leg hop, side hop) were mathematically combined to an estimator of knee function, which acceptably classified ACL-injured individuals and controls. This estimator, derived from objective measures, correlated significantly with self-reported function, e.g. Lysholm score (ρ = 0.66; p<0.001). The proposed test battery, based on a solid statistical approach, includes assessments which are all clinically feasible, while also covering complementary aspects of knee function. Similar test batteries could be determined for earlier phases of ACL rehabilitation or to enable longitudinal monitoring. Such developments, established on a well-grounded consensus of measurements, would facilitate comparisons of studies and enable evidence-based rehabilitation.

  10. Evaluation of primary immunization coverage of infants under universal immunization programme in an urban area of bangalore city using cluster sampling and lot quality assurance sampling techniques.

    PubMed

    K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth

    2008-07-01

    Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.

  11. Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

    PubMed

    Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

    2012-10-01

    The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  12. Kernel spectral clustering with memory effect

    NASA Astrophysics Data System (ADS)

    Langone, Rocco; Alzate, Carlos; Suykens, Johan A. K.

    2013-05-01

    Evolving graphs describe many natural phenomena changing over time, such as social relationships, trade markets, metabolic networks etc. In this framework, performing community detection and analyzing the cluster evolution represents a critical task. Here we propose a new model for this purpose, where the smoothness of the clustering results over time can be considered as a valid prior knowledge. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness. The latter allows the model to cluster the current data well and to be consistent with the recent history. We also propose new model selection criteria in order to carefully choose the hyper-parameters of our model, which is a crucial issue to achieve good performances. We successfully test the model on four toy problems and on a real world network. We also compare our model with Evolutionary Spectral Clustering, which is a state-of-the-art algorithm for community detection of evolving networks, illustrating that the kernel spectral clustering with memory effect can achieve better or equal performances.

  13. Sub-tesla-field magnetization of vibrated magnetic nanoreagents for screening tumor markers

    NASA Astrophysics Data System (ADS)

    Chieh, Jen-Jie; Huang, Kai-Wen; Shi, Jin-Cheng

    2015-02-01

    Magnetic nanoreagents (MNRs), consisting of liquid solutions and magnetic nanoparticles (MNPs) coated with bioprobes, have been widely used in biomedical disciplines. For in vitro tests of serum biomarkers, numerous MNR-based magnetic immunoassay methods or schemes have been developed; however, their applications are limited. In this study, a vibrating sample magnetometer (VSM) was used for screening tumor biomarkers based on the same MNRs as those used in other immunoassay methods. The examination mechanism is that examined tumor biomarkers are typically conjugated to the bioprobes coated on MNPs to form magnetic clusters. Consequently, the sub-Tesla-field magnetization (Msub-T) of MNRs, including magnetic clusters, exceeds that of MNRs containing only separate MNPs. For human serum samples, proteins other than the targeted biomarkers induce the formation of magnetic clusters with increased Msub-T because of weak nonspecific binding. In this study, this interference problem was suppressed by the vibration condition in the VSM and analysis. Based on a referenced Msub-T,0 value defined by the average Msub-T value of a normal person's serum samples, including general proteins and few tumor biomarkers, the difference ΔMsub-T between the measured Msub-T and the reference Msub-T,0 determined the expression of only target tumor biomarkers in the tested serum samples. By using common MNRs with an alpha-fetoprotein-antibody coating, this study demonstrated that a current VSM can perform clinical screening of hepatocellular carcinoma.

  14. Locally Weighted Ensemble Clustering.

    PubMed

    Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang

    2018-05-01

    Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.

  15. Clustering on Magnesium Surfaces - Formation and Diffusion Energies.

    PubMed

    Chu, Haijian; Huang, Hanchen; Wang, Jian

    2017-07-12

    The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and [Formula: see text]. In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a [Formula: see text], clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface [Formula: see text] is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich-Schwoebel barriers converge as the step height is three atomic layers or thicker. Adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.

  16. Efficient clustering aggregation based on data fragments.

    PubMed

    Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

    2012-06-01

    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.

  17. Color evolution from z = 0 to z = 1

    NASA Technical Reports Server (NTRS)

    Rakos, Karl D.; Schombert, James M.

    1995-01-01

    Rest frame Stroemgren photometry (3500 A, 4100 A, 4750 A, and 5500 A) is presented for 509 galaxies in 17 rich clusters between z = 0 and z = 1 as a test of color evolution. Our observations confirm a strong, rest frame, Butcher-Oemler effect where the fraction of blue galaxies increases from 20% at z = 0.4 to 80% at z = 0.9. We also find that a majority of these blue cluster galaxies are composed of normal disk or post-starburst systems based on color criteria. When comparing our colors to the morphological results from Hubble Space Telescope HST imaging, we propose that the blue cluster galaxies are a population of late-type, low surface brightness objects which fade and are then destroyed by the cluster tidal field. After isolating the red objects from Butcher-Oemler objects, we have compared the mean color of these old, non-star-forming objects with spectral energy distribution models in the literature as a test for passive galaxy evolution in ellipticals. We find good agreement with single-burst models which predict a mean epoch of galaxy formation at z = 5. Tracing the red envelope for ellipticals places the earliest epoch of galaxy formation at z = 10.

  18. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    USGS Publications Warehouse

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  19. AN ASTEROSEISMIC MEMBERSHIP STUDY OF THE RED GIANTS IN THREE OPEN CLUSTERS OBSERVED BY KEPLER: NGC 6791, NGC 6819, AND NGC 6811

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stello, Dennis; Huber, Daniel; Bedding, Timothy R.

    Studying star clusters offers significant advances in stellar astrophysics due to the combined power of having many stars with essentially the same distance, age, and initial composition. This makes clusters excellent test benches for verification of stellar evolution theory. To fully exploit this potential, it is vital that the star sample is uncontaminated by stars that are not members of the cluster. Techniques for determining cluster membership therefore play a key role in the investigation of clusters. We present results on three clusters in the Kepler field of view based on a newly established technique that uses asteroseismology to identifymore » fore- or background stars in the field, which demonstrates advantages over classical methods such as kinematic and photometry measurements. Four previously identified seismic non-members in NGC 6819 are confirmed in this study, and three additional non-members are found-two in NGC 6819 and one in NGC 6791. We further highlight which stars are, or might be, affected by blending, which needs to be taken into account when analyzing these Kepler data.« less

  20. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  1. Spatio-Temporal Epidemiology of Viral Hepatitis in China (2003-2015): Implications for Prevention and Control Policies.

    PubMed

    Zhu, Bin; Liu, Jinlin; Fu, Yang; Zhang, Bo; Mao, Ying

    2018-04-02

    Viral hepatitis, as one of the most serious notifiable infectious diseases in China, takes heavy tolls from the infected and causes a severe economic burden to society, yet few studies have systematically explored the spatio-temporal epidemiology of viral hepatitis in China. This study aims to explore, visualize and compare the epidemiologic trends and spatial changing patterns of different types of viral hepatitis (A, B, C, E and unspecified, based on the classification of CDC) at the provincial level in China. The growth rates of incidence are used and converted to box plots to visualize the epidemiologic trends, with the linear trend being tested by chi-square linear by linear association test. Two complementary spatial cluster methods are used to explore the overall agglomeration level and identify spatial clusters: spatial autocorrelation analysis (measured by global and local Moran's I) and space-time scan analysis. Based on the spatial autocorrelation analysis, the hotspots of hepatitis A remain relatively stable and gradually shrunk, with Yunnan and Sichuan successively moving out the high-high (HH) cluster area. The HH clustering feature of hepatitis B in China gradually disappeared with time. However, the HH cluster area of hepatitis C has gradually moved towards the west, while for hepatitis E, the provincial units around the Yangtze River Delta region have been revealing HH cluster features since 2005. The space-time scan analysis also indicates the distinct spatial changing patterns of different types of viral hepatitis in China. It is easy to conclude that there is no one-size-fits-all plan for the prevention and control of viral hepatitis in all the provincial units. An effective response requires a package of coordinated actions, which should vary across localities regarding the spatial-temporal epidemic dynamics of each type of virus and the specific conditions of each provincial unit.

  2. Determination System Of Food Vouchers For the Poor Based On Fuzzy C-Means Method

    NASA Astrophysics Data System (ADS)

    Anamisa, D. R.; Yusuf, M.; Syakur, M. A.

    2018-01-01

    Food vouchers are government programs to tackle the poverty of rural communities. This program aims to help the poor group in getting enough food and nutrients from carbohydrates. There are several factors that influence to receive the food voucher, such as: job, monthly income, Taxes, electricity bill, size of house, number of family member, education certificate and amount of rice consumption every week. In the execution for the distribution of vouchers is often a lot of problems, such as: the distribution of food vouchers has been misdirected and someone who receives is still subjective. Some of the solutions to decision making have not been done. The research aims to calculating the change of each partition matrix and each cluster using Fuzzy C-Means method. Hopefully this research makes contribution by providing higher result using Fuzzy C-Means comparing to other method for this case study. In this research, decision making is done by using Fuzzy C-Means method. The Fuzzy C-Means method is a clustering method that has an organized and scattered cluster structure with regular patterns on two-dimensional datasets. Furthermore, Fuzzy C-Means method used for calculates the change of each partition matrix. Each cluster will be sorted by the proximity of the data element to the centroid of the cluster to get the ranking. Various trials were conducted for grouping and ranking of proposed data that received food vouchers based on the quota of each village. This testing by Fuzzy C-Means method, is developed and abled for determining the recipient of the food voucher with satisfaction results. Fulfillment of the recipient of the food voucher is 80% to 90% and this testing using data of 115 Family Card from 6 Villages. The quality of success affected, has been using the number of iteration factors is 20 and the number of clusters is 3

  3. White Matter Tract Integrity in Alzheimer's Disease vs. Late Onset Bipolar Disorder and Its Correlation with Systemic Inflammation and Oxidative Stress Biomarkers.

    PubMed

    Besga, Ariadna; Chyzhyk, Darya; Gonzalez-Ortega, Itxaso; Echeveste, Jon; Graña-Lecuona, Marina; Graña, Manuel; Gonzalez-Pinto, Ana

    2017-01-01

    Background: Late Onset Bipolar Disorder (LOBD) is the development of Bipolar Disorder (BD) at an age above 50 years old. It is often difficult to differentiate from other aging dementias, such as Alzheimer's Disease (AD), because they share cognitive and behavioral impairment symptoms. Objectives: We look for WM tract voxel clusters showing significant differences when comparing of AD vs. LOBD, and its correlations with systemic blood plasma biomarkers (inflammatory, neurotrophic factors, and oxidative stress). Materials: A sample of healthy controls (HC) ( n = 19), AD patients ( n = 35), and LOBD patients ( n = 24) was recruited at the Alava University Hospital. Blood plasma samples were obtained at recruitment time and analyzed to extract the inflammatory, oxidative stress, and neurotrophic factors. Several modalities of MRI were acquired for each subject, Methods: Fractional anisotropy (FA) coefficients are obtained from diffusion weighted imaging (DWI). Tract based spatial statistics (TBSS) finds FA skeleton clusters of WM tract voxels showing significant differences for all possible contrasts between HC, AD, and LOBD. An ANOVA F -test over all contrasts is carried out. Results of F -test are used to mask TBSS detected clusters for the AD > LOBD and LOBD > AD contrast to select the image clusters used for correlation analysis. Finally, Pearson's correlation coefficients between FA values at cluster sites and systemic blood plasma biomarker values are computed. Results: The TBSS contrasts with by ANOVA F -test has identified strongly significant clusters in the forceps minor, inferior longitudinal fasciculus, inferior fronto-occipital fasciculus, and cingulum gyrus. The correlation analysis of these tract clusters found strong negative correlation of AD with the nerve growth factor (NGF) and brain derived neurotrophic factor (BDNF) blood biomarkers. Negative correlation of AD and positive correlation of LOBD with inflammation biomarker IL6 was also found. Conclusion: TBSS voxel clusters tract atlas localizations are consistent with greater behavioral impairment and mood disorders in LOBD than in AD. Correlation analysis confirms that neurotrophic factors (i.e., NGF, BDNF) play a great role in AD while are absent in LOBD pathophysiology. Also, correlation results of IL1 and IL6 suggest stronger inflammatory effects in LOBD than in AD.

  4. High-accuracy identification of incident HIV-1 infections using a sequence clustering based diversity measure.

    PubMed

    Xia, Xia-Yu; Ge, Meng; Hsi, Jenny H; He, Xiang; Ruan, Yu-Hua; Wang, Zhi-Xin; Shao, Yi-Ming; Pan, Xian-Ming

    2014-01-01

    Accurate estimates of HIV-1 incidence are essential for monitoring epidemic trends and evaluating intervention efforts. However, the long asymptomatic stage of HIV-1 infection makes it difficult to effectively distinguish incident infections from chronic ones. Current incidence assays based on serology or viral sequence diversity are both still lacking in accuracy. In the present work, a sequence clustering based diversity (SCBD) assay was devised by utilizing the fact that viral sequences derived from each transmitted/founder (T/F) strain tend to cluster together at early stage, and that only the intra-cluster diversity is correlated with the time since HIV-1 infection. The dot-matrix pairwise alignment was used to eliminate the disproportional impact of insertion/deletions (indels) and recombination events, and so was the proportion of clusterable sequences (Pc) as an index to identify late chronic infections with declined viral genetic diversity. Tested on a dataset containing 398 incident and 163 chronic infection cases collected from the Los Alamos HIV database (last modified 2/8/2012), our SCBD method achieved 99.5% sensitivity and 98.8% specificity, with an overall accuracy of 99.3%. Further analysis and evaluation also suggested its performance was not affected by host factors such as the viral subtypes and transmission routes. The SCBD method demonstrated the potential of sequencing based techniques to become useful for identifying incident infections. Its use may be most advantageous for settings with low to moderate incidence relative to available resources. The online service is available at http://www.bioinfo.tsinghua.edu.cn:8080/SCBD/index.jsp.

  5. History, geography and host use shape genomewide patterns of genetic variation in the redheaded pine sawfly (Neodiprion lecontei).

    PubMed

    Bagley, Robin K; Sousa, Vitor C; Niemiller, Matthew L; Linnen, Catherine R

    2017-02-01

    Divergent host use has long been suspected to drive population differentiation and speciation in plant-feeding insects. Evaluating the contribution of divergent host use to genetic differentiation can be difficult, however, as dispersal limitation and population structure may also influence patterns of genetic variation. In this study, we use double-digest restriction-associated DNA (ddRAD) sequencing to test the hypothesis that divergent host use contributes to genetic differentiation among populations of the redheaded pine sawfly (Neodiprion lecontei), a widespread pest that uses multiple Pinus hosts throughout its range in eastern North America. Because this species has a broad range and specializes on host plants known to have migrated extensively during the Pleistocene, we first assess overall genetic structure using model-based and model-free clustering methods and identify three geographically distinct genetic clusters. Next, using a composite-likelihood approach based on the site frequency spectrum and a novel strategy for maximizing the utility of linked RAD markers, we infer the population topology and date divergence to the Pleistocene. Based on existing knowledge of Pinus refugia, estimated demographic parameters and patterns of diversity among sawfly populations, we propose a Pleistocene divergence scenario for N. lecontei. Finally, using Mantel and partial Mantel tests, we identify a significant relationship between genetic distance and geography in all clusters, and between genetic distance and host use in two of three clusters. Overall, our results indicate that Pleistocene isolation, dispersal limitation and ecological divergence all contribute to genomewide differentiation in this species and support the hypothesis that host use is a common driver of population divergence in host-specialized insects. © 2016 John Wiley & Sons Ltd.

  6. Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

    PubMed

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M Teresa; Martín, María P

    2009-07-29

    Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

  7. Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

    PubMed Central

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

    2009-01-01

    Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601

  8. Tests for informative cluster size using a novel balanced bootstrap scheme.

    PubMed

    Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath

    2017-07-20

    Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Criterion Referenced Assessment Bank. Grade 6 Skill Clusters, Objectives, and Illustrations.

    ERIC Educational Resources Information Center

    Montgomery County Public Schools, Rockville, MD.

    Part of a series of competency-based test materials for grades six through ten, this set of nine test booklets for sixth graders contains multiple-choice questions designed to aid in the evaluation of the pupils' library skills. Accompanied by a separate, tenth booklet of illustrations which are to be used in conjunction with the questions, the…

  10. Mid-infrared Integrated-light Photometry Of LMC Star Clusters

    NASA Astrophysics Data System (ADS)

    Pessev, Peter; Goudfrooij, P.; Puzia, T.; Chandar, R.

    2008-03-01

    Massive star clusters (Galactic Globular Clusters and Populous Clusters in the Magellanic Clouds) are the best available approximation of Simple Stellar Populations (SSPs). Since the stellar populations in these nearby objects are studied in details, they provide fundamental age/metallicity templates for interpretation of the galaxy properties, testing and calibration of the SSP Models. Magellanic Cloud clusters are particularly important since they populate a region of the age/metallicity parameter space that is not easily accessible in our Galaxy. We present the first Mid-IR integrated-light measurements for six LMC clusters based on our Spitzer IRAC imaging program. Since we are targeting a specific group of intermediate-age clusters, our imaging goes deeper compared to SAGE-LMC survey data. We present a literature compilation of clusters' properties along with multi-wavelength integrated light photometry database spanning from the optical (Johnson U band) to the Mid-IR (IRAC Channel 4). This data provides an important empirical baseline for the interpretation of galaxy colors in the Mid-IR (especially high-z objects whose integrated-light is dominated by TP-AGB stars emission). It is also a valuable tool to check the SSP model predictions in the intermediate-age regime and provides calibration data for the next generation of SSP models.

  11. Eb&D: A new clustering approach for signed social networks based on both edge-betweenness centrality and density of subgraphs

    NASA Astrophysics Data System (ADS)

    Qi, Xingqin; Song, Huimin; Wu, Jianliang; Fuller, Edgar; Luo, Rong; Zhang, Cun-Quan

    2017-09-01

    Clustering algorithms for unsigned social networks which have only positive edges have been studied intensively. However, when a network has like/dislike, love/hate, respect/disrespect, or trust/distrust relationships, unsigned social networks with only positive edges are inadequate. Thus we model such kind of networks as signed networks which can have both negative and positive edges. Detecting the cluster structures of signed networks is much harder than for unsigned networks, because it not only requires that positive edges within clusters are as many as possible, but also requires that negative edges between clusters are as many as possible. Currently, we have few clustering algorithms for signed networks, and most of them requires the number of final clusters as an input while it is actually hard to predict beforehand. In this paper, we will propose a novel clustering algorithm called Eb &D for signed networks, where both the betweenness of edges and the density of subgraphs are used to detect cluster structures. A hierarchically nested system will be constructed to illustrate the inclusion relationships of clusters. To show the validity and efficiency of Eb &D, we test it on several classical social networks and also hundreds of synthetic data sets, and all obtain better results compared with other methods. The biggest advantage of Eb &D compared with other methods is that the number of clusters do not need to be known prior.

  12. E-learning or educational leaflet: does it make a difference in oral health promotion? A clustered randomized trial.

    PubMed

    Al Bardaweel, Susan; Dashash, Mayssoon

    2018-05-10

    The early recognition of technology together with great ability to use computers and smart systems have promoted researchers to investigate the possibilities of utilizing technology for improving health care in children. The aim of this study was to compare between the traditional educational leaflets and E-applications in improving oral health knowledge, oral hygiene and gingival health in schoolchildren of Damascus city, Syria. A clustered randomized controlled trial at two public primary schools was performed. About 220 schoolchildren aged 10-11 years were included in this study and grouped into two clusters. Children in Leaflet cluster received oral health education through leaflets, while children in E-learning cluster received oral health education through an E-learning program. A questionnaire was designed to register information related to oral health knowledge and to record Plaque and Gingival indices. Questionnaire administration and clinical assessment were undertaken at baseline, 6 and at 12 weeks of oral health education. Data was analysed using one way repeated measures ANOVA, post hoc Bonferroni test and independent samples t-test. Leaflet cluster (107 participants) had statistically significant better oral health knowledge than E-learning cluster (104 participants) at 6 weeks (P < 0.05) and at 12 weeks (P < 0.05) (Leaflet cluster:100 participants, E-learning cluster:100 participants). The mean knowledge gain compared to baseline was higher in Leaflet cluster than in E-learning cluster. A significant reduction in the PI means at 6 weeks and 12 weeks was observed in both clusters (P < 0.05) when compared to baseline. Children in Leaflet cluster had significantly less plaque than those in E-learning cluster at 6 weeks (P < 0.05) and at 12 weeks (P < 0.05). Similarly, a significant reduction in the GI means at 6 weeks and 12 weeks was observed in both clusters when compared to baseline (P < 0.05). Children in Leaflet cluster had statistically significant better gingival health than E-learning cluster at 6 weeks (P < 0.05) and 12 weeks (P < 0.05). Traditional educational leaflets are an effective tool in the improvement of both oral health knowledge as well as clinical indices of oral hygiene and care among Syrian children. Leaflets can be used in school-based oral health education for a positive outcome. Australian New Zealand Clinical Trials Registry ( ACTRN12618000395235 ), Date registered: 16/03/2018, retrospectively registered.

  13. Proof test methodology for composites

    NASA Technical Reports Server (NTRS)

    Wu, Edward M.; Bell, David K.

    1992-01-01

    The special requirements for proof test of composites are identified based on the underlying failure process of composites. Two proof test methods are developed to eliminate the inevitable weak fiber sites without also causing flaw clustering which weakens the post-proof-test composite. Significant reliability enhancement by these proof test methods has been experimentally demonstrated for composite strength and composite life in tension. This basic proof test methodology is relevant to the certification and acceptance of critical composite structures. It can also be applied to the manufacturing process development to achieve zero-reject for very large composite structures.

  14. A singular value decomposition approach for improved taxonomic classification of biological sequences

    PubMed Central

    2011-01-01

    Background Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. Results We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. Conclusions By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. PMID:22369633

  15. Retrieval with Clustering in a Case-Based Reasoning System for Radiotherapy Treatment Planning

    NASA Astrophysics Data System (ADS)

    Khussainova, Gulmira; Petrovic, Sanja; Jagannathan, Rupa

    2015-05-01

    Radiotherapy treatment planning aims to deliver a sufficient radiation dose to cancerous tumour cells while sparing healthy organs in the tumour surrounding area. This is a trial and error process highly dependent on the medical staff's experience and knowledge. Case-Based Reasoning (CBR) is an artificial intelligence tool that uses past experiences to solve new problems. A CBR system has been developed to facilitate radiotherapy treatment planning for brain cancer. Given a new patient case the existing CBR system retrieves a similar case from an archive of successfully treated patient cases with the suggested treatment plan. The next step requires adaptation of the retrieved treatment plan to meet the specific demands of the new case. The CBR system was tested by medical physicists for the new patient cases. It was discovered that some of the retrieved cases were not suitable and could not be adapted for the new cases. This motivated us to revise the retrieval mechanism of the existing CBR system by adding a clustering stage that clusters cases based on their tumour positions. A number of well-known clustering methods were investigated and employed in the retrieval mechanism. Results using real world brain cancer patient cases have shown that the success rate of the new CBR retrieval is higher than that of the original system.

  16. Stellar clusters in the Gaia era

    NASA Astrophysics Data System (ADS)

    Bragaglia, Angela

    2018-04-01

    Stellar clusters are important for astrophysics in many ways, for instance as optimal tracers of the Galactic populations to which they belong or as one of the best test bench for stellar evolutionary models. Gaia DR1, with TGAS, is just skimming the wealth of exquisite information we are expecting from the more advanced catalogues, but already offers good opportunities and indicates the vast potentialities. Gaia results can be efficiently complemented by ground-based data, in particular by large spectroscopic and photometric surveys. Examples of some scientific results of the Gaia-ESO survey are presented, as a teaser for what will be possible once advanced Gaia releases and ground-based data will be combined.

  17. A critical analysis of high-redshift, massive, galaxy clusters. Part I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoyle, Ben; Jimenez, Raul; Verde, Licia

    2012-02-01

    We critically investigate current statistical tests applied to high redshift clusters of galaxies in order to test the standard cosmological model and describe their range of validity. We carefully compare a sample of high-redshift, massive, galaxy clusters with realistic Poisson sample simulations of the theoretical mass function, which include the effect of Eddington bias. We compare the observations and simulations using the following statistical tests: the distributions of ensemble and individual existence probabilities (in the > M, > z sense), the redshift distributions, and the 2d Kolmogorov-Smirnov test. Using seemingly rare clusters from Hoyle et al. (2011), and Jee etmore » al. (2011) and assuming the same survey geometry as in Jee et al. (2011, which is less conservative than Hoyle et al. 2011), we find that the ( > M, > z) existence probabilities of all clusters are fully consistent with ΛCDM. However assuming the same survey geometry, we use the 2d K-S test probability to show that the observed clusters are not consistent with being the least probable clusters from simulations at > 95% confidence, and are also not consistent with being a random selection of clusters, which may be caused by the non-trivial selection function and survey geometry. Tension can be removed if we examine only a X-ray selected sub sample, with simulations performed assuming a modified survey geometry.« less

  18. Dielectric-spectroscopy approach to ferrofluid nanoparticle clustering induced by an external electric field.

    PubMed

    Rajnak, Michal; Kurimsky, Juraj; Dolnik, Bystrik; Kopcansky, Peter; Tomasovicova, Natalia; Taculescu-Moaca, Elena Alina; Timko, Milan

    2014-09-01

    An experimental study of magnetic colloidal particles cluster formation induced by an external electric field in a ferrofluid based on transformer oil is presented. Using frequency domain isothermal dielectric spectroscopy, we study the influence of a test cell electrode separation distance on a low-frequency relaxation process. We consider the relaxation process to be associated with an electric double layer polarization taking place on the particle surface. It has been found that the relaxation maximum considerably shifts towards lower frequencies when conducting the measurements in the test cells with greater electrode separation distances. As the electric field intensity was always kept at a constant value, we propose that the particle cluster formation induced by the external ac electric field accounts for that phenomenon. The increase in the relaxation time is in accordance with the Schwarz theory of electric double layer polarization. In addition, we analyze the influence of a static electric field generated by dc bias voltage on a similar shift in the relaxation maximum position. The variation of the dc electric field for the hysteresis measurements purpose provides understanding of the development of the particle clusters and their decay. Following our results, we emphasize the utility of dielectric spectroscopy as a simple, complementary method for detection and study of clusters of colloidal particles induced by external electric field.

  19. Relatedness and nesting dispersion within breeding populations of greater white-fronted geese

    USGS Publications Warehouse

    Fowler, A.C.; Eadie, J.M.; Ely, Craig R.

    2004-01-01

    We studied patterns of relatedness and nesting dispersion in female Pacific Greater White-fronted Geese (Anser albifrons frontalis) in Alaska. Female Greater White-fronted Geese are thought to be strongly philopatric and are often observed nesting in close association with other females. Analysis of the distribution of nests on the Yukon-Kuskokwim Delta in 1998 indicated that nests were significantly clumped. We tested the hypothesis that females in the same nest cluster would be closely related using estimates of genetic relatedness based on six microsatellite DNA loci. There was no difference in the mean relatedness of females in the same cluster compared to females found in different clusters. However, relatedness among females was negatively correlated with distance between their nests, and geese nesting within 50 m of one another tended to be more closely related than those nesting farther apart. Randomization tests revealed that pairs of related individuals (R > 0.45) were more likely to occur in the same cluster when analyzed at the scale of the entire study site. However, the pattern did not hold when restricted to pairs found within 500 m of each other. Our results indicate that nest clusters are not composed primarily of closely related females, but Greater White-fronted Geese appear to be sufficiently philopatric to promote nonrandom patterns of relatedness at a local scale.

  20. A taxonomy of epithelial human cancer and their metastases

    PubMed Central

    2009-01-01

    Background Microarray technology has allowed to molecularly characterize many different cancer sites. This technology has the potential to individualize therapy and to discover new drug targets. However, due to technological differences and issues in standardized sample collection no study has evaluated the molecular profile of epithelial human cancer in a large number of samples and tissues. Additionally, it has not yet been extensively investigated whether metastases resemble their tissue of origin or tissue of destination. Methods We studied the expression profiles of a series of 1566 primary and 178 metastases by unsupervised hierarchical clustering. The clustering profile was subsequently investigated and correlated with clinico-pathological data. Statistical enrichment of clinico-pathological annotations of groups of samples was investigated using Fisher exact test. Gene set enrichment analysis (GSEA) and DAVID functional enrichment analysis were used to investigate the molecular pathways. Kaplan-Meier survival analysis and log-rank tests were used to investigate prognostic significance of gene signatures. Results Large clusters corresponding to breast, gastrointestinal, ovarian and kidney primary tissues emerged from the data. Chromophobe renal cell carcinoma clustered together with follicular differentiated thyroid carcinoma, which supports recent morphological descriptions of thyroid follicular carcinoma-like tumors in the kidney and suggests that they represent a subtype of chromophobe carcinoma. We also found an expression signature identifying primary tumors of squamous cell histology in multiple tissues. Next, a subset of ovarian tumors enriched with endometrioid histology clustered together with endometrium tumors, confirming that they share their etiopathogenesis, which strongly differs from serous ovarian tumors. In addition, the clustering of colon and breast tumors correlated with clinico-pathological characteristics. Moreover, a signature was developed based on our unsupervised clustering of breast tumors and this was predictive for disease-specific survival in three independent studies. Next, the metastases from ovarian, breast, lung and vulva cluster with their tissue of origin while metastases from colon showed a bimodal distribution. A significant part clusters with tissue of origin while the remaining tumors cluster with the tissue of destination. Conclusion Our molecular taxonomy of epithelial human cancer indicates surprising correlations over tissues. This may have a significant impact on the classification of many cancer sites and may guide pathologists, both in research and daily practice. Moreover, these results based on unsupervised analysis yielded a signature predictive of clinical outcome in breast cancer. Additionally, we hypothesize that metastases from gastrointestinal origin either remember their tissue of origin or adapt to the tissue of destination. More specifically, colon metastases in the liver show strong evidence for such a bimodal tissue specific profile. PMID:20017941

  1. Optimized scheme in coal-fired boiler combustion based on information entropy and modified K-prototypes algorithm

    NASA Astrophysics Data System (ADS)

    Gu, Hui; Zhu, Hongxia; Cui, Yanfeng; Si, Fengqi; Xue, Rui; Xi, Han; Zhang, Jiayu

    2018-06-01

    An integrated combustion optimization scheme is proposed for the combined considering the restriction in coal-fired boiler combustion efficiency and outlet NOx emissions. Continuous attribute discretization and reduction techniques are handled as optimization preparation by E-Cluster and C_RED methods, in which the segmentation numbers don't need to be provided in advance and can be continuously adapted with data characters. In order to obtain results of multi-objections with clustering method for mixed data, a modified K-prototypes algorithm is then proposed. This algorithm can be divided into two stages as K-prototypes algorithm for clustering number self-adaptation and clustering for multi-objective optimization, respectively. Field tests were carried out at a 660 MW coal-fired boiler to provide real data as a case study for controllable attribute discretization and reduction in boiler system and obtaining optimization parameters considering [ maxηb, minyNOx ] multi-objective rule.

  2. The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

    NASA Astrophysics Data System (ADS)

    Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

    2018-04-01

    Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.

  3. Establishment of a Hall Thruster Cluster

    DTIC Science & Technology

    2004-02-01

    DURIP funds were used to develop a Hall thruster cluster test facility centered around the University of Michigan Large Vacuum Test Facility and a 2x2 cluster of BUSEK 600 W BHT-600 Hall thrusters. This capability will facilitate our three-year program to address the issue of high-power CDT operation and to provide insight on how chamber effects influence CDT engine/cluster characteristics.

  4. Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    NASA Astrophysics Data System (ADS)

    Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.

    2016-09-01

    We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.

  5. Degree-based statistic and center persistency for brain connectivity analysis.

    PubMed

    Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong

    2017-01-01

    Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  6. Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    DOE PAGES

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

    2016-09-20

    In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less

  7. Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.

    In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less

  8. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  9. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    NASA Astrophysics Data System (ADS)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  10. Using concept mapping in the knowledge-to-action process to compare stakeholder opinions on barriers to use of cancer screening among South Asians.

    PubMed

    Lobb, Rebecca; Pinto, Andrew D; Lofters, Aisha

    2013-03-23

    Using the knowledge-to-action (KTA) process, this study examined barriers to use of evidence-based interventions to improve early detection of cancer among South Asians from the perspective of multiple stakeholders. In 2011, we used concept mapping with South Asian residents, and representatives from health service and community service organizations in the region of Peel Ontario. As part of concept mapping procedures, brainstorming sessions were conducted with stakeholders (n = 53) to identify barriers to cancer screening among South Asians. Participants (n = 46) sorted barriers into groups, and rated barriers from lowest (1) to highest (6) in terms of importance for use of mammograms, Pap tests and fecal occult blood tests, and how feasible it would be to address them. Multi-dimensional scaling, cluster analysis, and descriptive statistics were used to analyze the data. A total of 45 unique barriers to use of mammograms, Pap tests, and fecal occult blood tests among South Asians were classified into seven clusters using concept mapping procedures: patient's beliefs, fears, lack of social support; health system; limited knowledge among residents; limited knowledge among physicians; health education programs; ethno-cultural discordance with the health system; and cost. Overall, the top three ranked clusters of barriers were 'limited knowledge among residents,' 'ethno-cultural discordance,' and 'health education programs' across surveys. Only residents ranked 'cost' second in importance for fecal occult blood testing, and stakeholders from health service organizations ranked 'limited knowledge among physicians' third for the feasibility survey. Stakeholders from health services organizations ranked 'limited knowledge among physicians' fourth for all other surveys, but this cluster consistently ranked lowest among residents. The limited reach of cancer control programs to racial and ethnic minority groups is a critical implementation issue that requires attention. Opinions of community service and health service organizations on why this deficit in implementation occurs are fundamental to understanding the solutions because these are the settings in which evidence-based interventions are implemented. Using concept mapping within a KTA process can facilitate the engagement of multiple stakeholders in the utilization of study results and in identifying next steps for action.

  11. Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.

    PubMed

    Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves

    2011-08-01

    The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.

  12. On the statistics of proto-cluster candidates detected in the Planck all-sky survey

    NASA Astrophysics Data System (ADS)

    Negrello, M.; Gonzalez-Nuevo, J.; De Zotti, G.; Bonato, M.; Cai, Z.-Y.; Clements, D.; Danese, L.; Dole, H.; Greenslade, J.; Lapi, A.; Montier, L.

    2017-09-01

    Observational investigations of the abundance of massive precursors of local galaxy clusters ('proto-clusters') allow us to test the growth of density perturbations, to constrain cosmological parameters that control it, to test the theory of non-linear collapse and how the galaxy formation takes place in dense environments. The Planck collaboration has recently published a catalogue of ≳2000 cold extragalactic sub-millimeter sources, I.e. with colours indicative of z ≳ 2, almost all of which appear to be overdensities of star-forming galaxies. They are thus considered as proto-cluster candidates. Their number densities (or their flux densities) are far in excess of expectations from the standard scenario for the evolution of large-scale structure. Simulations based on a physically motivated galaxy evolution model show that essentially all cold peaks brighter than S545GHz = 500 mJy found in Planck maps after having removed the Galactic dust emission can be interpreted as positive Poisson fluctuations of the number of high-z dusty proto-clusters within the same Planck beam, rather then being individual clumps of physically bound galaxies. This conclusion does not change if an empirical fit to the luminosity function of dusty galaxies is used instead of the physical model. The simulations accurately reproduce the statistic of the Planck detections and yield distributions of sizes and ellipticities in qualitative agreement with observations. The redshift distribution of the brightest proto-clusters contributing to the cold peaks has a broad maximum at 1.5 ≤ z ≤ 3. Therefore follow-up of Planck proto-cluster candidates will provide key information on the high-z evolution of large scale structure.

  13. The properties of small Ag clusters bound to DNA bases.

    PubMed

    Soto-Verdugo, Víctor; Metiu, Horia; Gwinn, Elisabeth

    2010-05-21

    We study the binding of neutral silver clusters, Ag(n) (n=1-6), to the DNA bases adenine (A), cytosine (C), guanine (G), and thymine (T) and the absorption spectra of the silver cluster-base complexes. Using density functional theory (DFT), we find that the clusters prefer to bind to the doubly bonded ring nitrogens and that binding to T is generally much weaker than to C, G, and A. Ag(3) and Ag(4) make the stronger bonds. Bader charge analysis indicates a mild electron transfer from the base to the clusters for all bases, except T. The donor bases (C, G, and A) bind to the sites on the cluster where the lowest unoccupied molecular orbital has a pronounced protrusion. The site where cluster binds to the base is controlled by the shape of the higher occupied states of the base. Time-dependent DFT calculations show that different base-cluster isomers may have very different absorption spectra. In particular, we find new excitations in base-cluster molecules, at energies well below those of the isolated components, and with strengths that depend strongly on the orientations of planar clusters with respect to the base planes. Our results suggest that geometric constraints on binding, imposed by designed DNA structures, may be a feasible route to engineering the selection of specific cluster-base assemblies.

  14. On the applicability of one- and many-electron quantum chemistry models for hydrated electron clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turi, László, E-mail: turi@chem.elte.hu

    2016-04-21

    We evaluate the applicability of a hierarchy of quantum models in characterizing the binding energy of excess electrons to water clusters. In particular, we calculate the vertical detachment energy of an excess electron from water cluster anions with methods that include one-electron pseudopotential calculations, density functional theory (DFT) based calculations, and ab initio quantum chemistry using MP2 and eom-EA-CCSD levels of theory. The examined clusters range from the smallest cluster size (n = 2) up to nearly nanosize clusters with n = 1000 molecules. The examined cluster configurations are extracted from mixed quantum-classical molecular dynamics trajectories of cluster anions withmore » n = 1000 water molecules using two different one-electron pseudopotenial models. We find that while MP2 calculations with large diffuse basis set provide a reasonable description for the hydrated electron system, DFT methods should be used with precaution and only after careful benchmarking. Strictly tested one-electron psudopotentials can still be considered as reasonable alternatives to DFT methods, especially in large systems. The results of quantum chemistry calculations performed on configurations, that represent possible excess electron binding motifs in the clusters, appear to be consistent with the results using a cavity structure preferring one-electron pseudopotential for the hydrated electron, while they are in sharp disagreement with the structural predictions of a non-cavity model.« less

  15. Spatial cluster for clustering the influence factor of birth and death child in Bogor Regency, West Java

    NASA Astrophysics Data System (ADS)

    Bekti, Rokhana Dwi; Rachmawati, Ro'fah

    2014-03-01

    The number of birth and death child is the benchmarks to determine and monitor the health and welfare in Indonesia. It can be used to identify groups of people who have a high mortality risk. Identifying group is important to compare the characteristics of human that have high and low risk. These characteristics can be seen from the factors that influenced it. Furthermore, there are factors which influence of birth and death child, such us economic, health facility, education, and others. The influence factors of every individual are different, but there are similarities some individuals which live close together or in the close locations. It means there was spatial effect. To identify group in this research, clustering is done by spatial cluster method, which is view to considering the influence of the location or the relationship between locations. One of spatial cluster method is Spatial 'K'luster Analysis by Tree Edge Removal (SKATER). The research was conducted in Bogor Regency, West Java. The goal was to get a cluster of districts based on the factors that influence birth and death child. SKATER build four number of cluster respectively consists of 26, 7, 2, and 5 districts. SKATER has good performance for clustering which include spatial effect. If it compare by other cluster method, Kmeans has good performance by MANOVA test.

  16. On the applicability of one- and many-electron quantum chemistry models for hydrated electron clusters

    NASA Astrophysics Data System (ADS)

    Turi, László

    2016-04-01

    We evaluate the applicability of a hierarchy of quantum models in characterizing the binding energy of excess electrons to water clusters. In particular, we calculate the vertical detachment energy of an excess electron from water cluster anions with methods that include one-electron pseudopotential calculations, density functional theory (DFT) based calculations, and ab initio quantum chemistry using MP2 and eom-EA-CCSD levels of theory. The examined clusters range from the smallest cluster size (n = 2) up to nearly nanosize clusters with n = 1000 molecules. The examined cluster configurations are extracted from mixed quantum-classical molecular dynamics trajectories of cluster anions with n = 1000 water molecules using two different one-electron pseudopotenial models. We find that while MP2 calculations with large diffuse basis set provide a reasonable description for the hydrated electron system, DFT methods should be used with precaution and only after careful benchmarking. Strictly tested one-electron psudopotentials can still be considered as reasonable alternatives to DFT methods, especially in large systems. The results of quantum chemistry calculations performed on configurations, that represent possible excess electron binding motifs in the clusters, appear to be consistent with the results using a cavity structure preferring one-electron pseudopotential for the hydrated electron, while they are in sharp disagreement with the structural predictions of a non-cavity model.

  17. Using Grey Wolf Algorithm to Solve the Capacitated Vehicle Routing Problem

    NASA Astrophysics Data System (ADS)

    Korayem, L.; Khorsid, M.; Kassem, S. S.

    2015-05-01

    The capacitated vehicle routing problem (CVRP) is a class of the vehicle routing problems (VRPs). In CVRP a set of identical vehicles having fixed capacities are required to fulfill customers' demands for a single commodity. The main objective is to minimize the total cost or distance traveled by the vehicles while satisfying a number of constraints, such as: the capacity constraint of each vehicle, logical flow constraints, etc. One of the methods employed in solving the CVRP is the cluster-first route-second method. It is a technique based on grouping of customers into a number of clusters, where each cluster is served by one vehicle. Once clusters are formed, a route determining the best sequence to visit customers is established within each cluster. The recently bio-inspired grey wolf optimizer (GWO), introduced in 2014, has proven to be efficient in solving unconstrained, as well as, constrained optimization problems. In the current research, our main contributions are: combining GWO with the traditional K-means clustering algorithm to generate the ‘K-GWO’ algorithm, deriving a capacitated version of the K-GWO algorithm by incorporating a capacity constraint into the aforementioned algorithm, and finally, developing 2 new clustering heuristics. The resulting algorithm is used in the clustering phase of the cluster-first route-second method to solve the CVR problem. The algorithm is tested on a number of benchmark problems with encouraging results.

  18. Comulang: towards a collaborative e-learning system that supports student group modeling.

    PubMed

    Troussas, Christos; Virvou, Maria; Alepis, Efthimios

    2013-01-01

    This paper describes an e-learning system that is expected to further enhance the educational process in computer-based tutoring systems by incorporating collaboration between students and work in groups. The resulting system is called "Comulang" while as a test bed for its effectiveness a multiple language learning system is used. Collaboration is supported by a user modeling module that is responsible for the initial creation of student clusters, where, as a next step, working groups of students are created. A machine learning clustering algorithm works towards group formatting, so that co-operations between students from different clusters are attained. One of the resulting system's basic aims is to provide efficient student groups whose limitations and capabilities are well balanced.

  19. Comparative analysis on the selection of number of clusters in community detection

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  20. Chemical characteristics for different parts of Panax notoginseng using pressurized liquid extraction and HPLC-ELSD.

    PubMed

    Wan, J B; Yang, F Q; Li, S P; Wang, Y T; Cui, X M

    2006-08-28

    The chemical characteristics for different parts of Panax notoginseng, including root, fibre root, rhizome, stem, leaf, flower and seed, were determined using high performance liquid chromatography-evaporative light scattering detection (HPLC-ELSD) and pressurized liquid extraction (PLE). Eight major saponins, namely notoginsenoside R1, ginsenosides Rg1, Re, Rb1, Rc, Rb2, Rb3 and Rd were also quantitatively compared among the different parts of P. notoginseng. The chromatograms showed that there was significant difference between underground (root, fibre root, rhizome) and aerial (leaf and flower) parts from P. notoginseng, though the similarities of entire chromatographic patterns among tested samples from underground (0.965+/-0.029, n=12) and aerial parts (0.987+/-0.014, n=5) were similar, respectively. Especially, no saponin was detected in the seed of P. notoginseng. Hierarchical clustering analysis based on eight investigated saponins or the ratios of contents for ginsenoside Rg1/Rb1 and ginsenoside Rb3/Rb1 showed that the samples from different parts of P. notoginseng were divided into three main clusters. One cluster was underground parts, which contained rich protopanaxatriol and protopanaxadiol types saponins. The leaf and flower were in the same cluster, which contained protopanaxadiol type saponins only. Especially, ginsenoside Rc, Rb2 and Rb3, rare in the underground parts, were rich in aerial parts of P. notoginseng. The stem of P. notoginseng was another cluster. Based on the cluster analysis, the chemical characteristics for different parts of P. notoginseng were revealed. They are composite cluster (underground parts), protopanaxadiol cluster (aerial parts) and interim (stem) cluster, which was the one between the two typical clusters, respectively. The result shows that chemical characteristics of underground parts and aerial parts from P. notoginseng are obviously different, which is helpful for pharmacological evaluation and quality control of P. notoginseng.

  1. Spatial clusters of daytime sleepiness and association with nighttime noise levels in a Swiss general population (GeoHypnoLaus).

    PubMed

    Joost, Stéphane; Haba-Rubio, José; Himsl, Rebecca; Vollenweider, Peter; Preisig, Martin; Waeber, Gérard; Marques-Vidal, Pedro; Heinzer, Raphaël; Guessous, Idris

    2018-05-31

    Daytime sleepiness is highly prevalent in the general adult population and has been linked to an increased risk of workplace and vehicle accidents, lower professional performance and poorer health. Despite the established relationship between noise and daytime sleepiness, little research has explored the individual-level spatial distribution of noise-related sleep disturbances. We assessed the spatial dependence of daytime sleepiness and tested whether clusters of individuals exhibiting higher daytime sleepiness were characterized by higher nocturnal noise levels than other clusters. Population-based cross-sectional study, in the city of Lausanne, Switzerland. Sleepiness was measured using the Epworth Sleepiness Scale (ESS) for 3697 georeferenced individuals from the CoLaus|PsyCoLaus cohort (period = 2009-2012). We used the sonBASE georeferenced database produced by the Swiss Federal Office for the Environment to characterize nighttime road traffic noise exposure throughout the city. We used the GeoDa software program to calculate the Getis-Ord G i * statistics for unadjusted and adjusted ESS in order to detect spatial clusters of high and low ESS values. Modeled nighttime noise exposure from road and rail traffic was compared across ESS clusters. Daytime sleepiness was not randomly distributed and showed a significant spatial dependence. The median nighttime traffic noise exposure was significantly different across the three ESS Getis cluster classes (p < 0.001). The mean nighttime noise exposure in the high ESS cluster class was 47.6, dB(A) 5.2 dB(A) higher than in low clusters (p < 0.001) and 2.1 dB(A) higher than in the neutral class (p < 0.001). These associations were independent of major potential confounders including body mass index and neighborhood income level. Clusters of higher daytime sleepiness in adults are associated with higher median nighttime noise levels. The identification of these clusters can guide tailored public health interventions. Copyright © 2018 The Authors. Published by Elsevier GmbH.. All rights reserved.

  2. What do You Need to Get Male Partners of Pregnant Women Tested for HIV in Resource Limited Settings? The Baby Shower Cluster Randomized Trial.

    PubMed

    Ezeanolue, Echezona E; Obiefune, Michael C; Yang, Wei; Ezeanolue, Chinenye O; Pharr, Jennifer; Osuji, Alice; Ogidi, Amaka G; Hunt, Aaron T; Patel, Dina; Ogedegbe, Gbenga; Ehiri, John E

    2017-02-01

    Male partner involvement has the potential to increase uptake of interventions to prevent mother-to-child transmission of HIV (PMTCT). Finding cultural appropriate strategies to promote male partner involvement in PMTCT programs remains an abiding public health challenge. We assessed whether a congregation-based intervention, the Healthy Beginning Initiative (HBI), would lead to increased uptake of HIV testing among male partners of pregnant women during pregnancy. A cluster-randomized controlled trial of forty churches in Southeastern Nigeria randomly assigned to either the HBI (intervention group; IG) or standard of care referral to a health facility (control group; CG) was conducted. Participants in the IG received education and were offered onsite HIV testing. Overall, 2498 male partners enrolled and participated, a participation rate of 88.9%. Results showed that male partners in the IG were 12 times more likely to have had an HIV test compared to male partners of pregnant women in the CG (CG = 37.71% vs. IG = 84.00%; adjusted odds ratio = 11.9; p < .01). Culturally appropriate and community-based interventions can be effective in increasing HIV testing and counseling among male partners of pregnant women.

  3. Simulations of Fractal Star Cluster Formation. I. New Insights for Measuring Mass Segregation of Star Clusters with Substructure

    NASA Astrophysics Data System (ADS)

    Yu, Jincheng; Puzia, Thomas H.; Lin, Congping; Zhang, Yiwei

    2017-05-01

    We compare the existent methods, including the minimum spanning tree based method and the local stellar density based method, in measuring mass segregation of star clusters. We find that the minimum spanning tree method reflects more the compactness, which represents the global spatial distribution of massive stars, while the local stellar density method reflects more the crowdedness, which provides the local gravitational potential information. It is suggested to measure the local and the global mass segregation simultaneously. We also develop a hybrid method that takes both aspects into account. This hybrid method balances the local and the global mass segregation in the sense that the predominant one is either caused by dynamical evolution or purely accidental, especially when such information is unknown a priori. In addition, we test our prescriptions with numerical models and show the impact of binaries in estimating the mass segregation value. As an application, we use these methods on the Orion Nebula Cluster (ONC) observations and the Taurus cluster. We find that the ONC is significantly mass segregated down to the 20th most massive stars. In contrast, the massive stars of the Taurus cluster are sparsely distributed in many different subclusters, showing a low degree of compactness. The massive stars of Taurus are also found to be distributed in the high-density region of the subclusters, showing significant mass segregation at subcluster scales. Meanwhile, we also apply these methods to discuss the possible mechanisms of the dynamical evolution of the simulated substructured star clusters.

  4. Simulations of Fractal Star Cluster Formation. I. New Insights for Measuring Mass Segregation of Star Clusters with Substructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Jincheng; Puzia, Thomas H.; Lin, Congping

    2017-05-10

    We compare the existent methods, including the minimum spanning tree based method and the local stellar density based method, in measuring mass segregation of star clusters. We find that the minimum spanning tree method reflects more the compactness, which represents the global spatial distribution of massive stars, while the local stellar density method reflects more the crowdedness, which provides the local gravitational potential information. It is suggested to measure the local and the global mass segregation simultaneously. We also develop a hybrid method that takes both aspects into account. This hybrid method balances the local and the global mass segregationmore » in the sense that the predominant one is either caused by dynamical evolution or purely accidental, especially when such information is unknown a priori. In addition, we test our prescriptions with numerical models and show the impact of binaries in estimating the mass segregation value. As an application, we use these methods on the Orion Nebula Cluster (ONC) observations and the Taurus cluster. We find that the ONC is significantly mass segregated down to the 20th most massive stars. In contrast, the massive stars of the Taurus cluster are sparsely distributed in many different subclusters, showing a low degree of compactness. The massive stars of Taurus are also found to be distributed in the high-density region of the subclusters, showing significant mass segregation at subcluster scales. Meanwhile, we also apply these methods to discuss the possible mechanisms of the dynamical evolution of the simulated substructured star clusters.« less

  5. Eosinophilic and Neutrophilic Airway Inflammation in the Phenotyping of Mild-to-Moderate Asthma and Chronic Obstructive Pulmonary Disease.

    PubMed

    Górska, Katarzyna; Paplińska-Goryca, Magdalena; Nejman-Gryz, Patrycja; Goryca, Krzysztof; Krenke, Rafał

    2017-04-01

    Asthma and chronic obstructive pulmonary disease (COPD) are heterogeneous diseases with different inflammatory phenotypes. Various inflammatory mediators play a role in these diseases. The aim of this study was to analyze the neutrophilic and eosinophilic airway and systemic inflammation as the phenotypic characterization of patients with asthma and COPD. Twenty-four patients with asthma and 33 patients with COPD were enrolled in the study. All the patients were in mild-to-moderate stage of disease, and none of them were treated with inhaled corticosteroids. Concentrations of IL-6, neutrophil elastase (NE), matrix metalloproteinase 9 (MMP-9), eosinophil cationic protein (ECP), and IL-33 and IL-17 in serum and induced sputum (IS) were measured by enzyme-linked immunosorbent assay (ELISA). The cellular composition of blood and IS was evaluated. Hierarchical clustering of patients was performed for the combination of selected clinical features and mediators. Asthma and COPD can be differentiated based on eosinophilic/neutrophilic systemic or airway inflammation with unsatisfactory efficiency. Hierarchical clustering of patients based on blood eosinophil percentage and clinical data revealed two asthma clusters differing in the number of positive skin prick tests and one COPD cluster with two subclusters characterized by low and high blood eosinophil concentrations. Clustering of patients according to IS measurements and clinical data showed two main clusters: pure asthma characterized by high eosinophil/atopy status and mixed asthma and COPD cluster with low eosinophil/atopy status. The neutrophilic phenotype of COPD was associated with more severe airway obstruction and hyperinflation.

  6. Impact of Different Visual Field Testing Paradigms on Sample Size Requirements for Glaucoma Clinical Trials.

    PubMed

    Wu, Zhichao; Medeiros, Felipe A

    2018-03-20

    Visual field testing is an important endpoint in glaucoma clinical trials, and the testing paradigm used can have a significant impact on the sample size requirements. To investigate this, this study included 353 eyes of 247 glaucoma patients seen over a 3-year period to extract real-world visual field rates of change and variability estimates to provide sample size estimates from computer simulations. The clinical trial scenario assumed that a new treatment was added to one of two groups that were both under routine clinical care, with various treatment effects examined. Three different visual field testing paradigms were evaluated: a) evenly spaced testing, b) United Kingdom Glaucoma Treatment Study (UKGTS) follow-up scheme, which adds clustered tests at the beginning and end of follow-up in addition to evenly spaced testing, and c) clustered testing paradigm, with clusters of tests at the beginning and end of the trial period and two intermediary visits. The sample size requirements were reduced by 17-19% and 39-40% using the UKGTS and clustered testing paradigms, respectively, when compared to the evenly spaced approach. These findings highlight how the clustered testing paradigm can substantially reduce sample size requirements and improve the feasibility of future glaucoma clinical trials.

  7. Relationship between Procedural Tactical Knowledge and Specific Motor Skills in Young Soccer Players

    PubMed Central

    Aquino, Rodrigo; Marques, Renato Francisco R.; Petiot, Grégory Hallé; Gonçalves, Luiz Guilherme C.; Moraes, Camila; Santiago, Paulo Roberto P.; Puggina, Enrico Fuini

    2016-01-01

    The purpose of this study was to investigate the association between offensive tactical knowledge and the soccer-specific motor skills performance. Fifteen participants were submitted to two evaluation tests, one to assess their technical and tactical analysis. The motor skills performance was measured through four tests of technical soccer skills: ball control, shooting, passing and dribbling. The tactical performance was based on a tactical assessment system called FUT-SAT (Analyses of Procedural Tactical Knowledge in Soccer). Afterwards, technical and tactical evaluation scores were ranked with and without the use of the cluster method. A positive, weak correlation was perceived in both analyses (rho = 0.39, not significant p = 0.14 (with cluster analysis); and rho = 0.35; not significant p = 0.20 (without cluster analysis)). We can conclude that there was a weak association between the technical and the offensive tactical knowledge. This shows the need to reflect on the use of such tests to assess technical skills in team sports since they do not take into account the variability and unpredictability of game actions and disregard the inherent needs to assess such skill performance in the game. PMID:29910300

  8. Detecting communities in large networks

    NASA Astrophysics Data System (ADS)

    Capocci, A.; Servedio, V. D. P.; Caldarelli, G.; Colaiori, F.

    2005-07-01

    We develop an algorithm to detect community structure in complex networks. The algorithm is based on spectral methods and takes into account weights and link orientation. Since the method detects efficiently clustered nodes in large networks even when these are not sharply partitioned, it turns to be specially suitable for the analysis of social and information networks. We test the algorithm on a large-scale data-set from a psychological experiment of word association. In this case, it proves to be successful both in clustering words, and in uncovering mental association patterns.

  9. Application of the SRI cloud-tracking technique to rapid-scan GOES observations

    NASA Technical Reports Server (NTRS)

    Wolf, D. E.; Endlich, R. M.

    1980-01-01

    An automatic cloud tracking system was applied to multilayer clouds associated with severe storms. The method was tested using rapid scan observations of Hurricane Eloise obtained by the GOES satellite on 22 September 1975. Cloud tracking was performed using clustering based either on visible or infrared data. The clusters were tracked using two different techniques. The data of 4 km and 8 km resolution of the automatic system yielded comparable in accuracy and coverage to those obtained by NASA analysts using the Atmospheric and Oceanographic Information Processing System.

  10. Computer aided detection of clusters of microcalcifications on full field digital mammograms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.

    2006-08-15

    We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case-based performance evaluation, a sensitivity of 70, 80, and 90 % can be achieved at 0.07, 0.17, and 0.65 FPs/image, respectively. We also used a data set of 216 mammograms negative for clustered microcalcifications to further estimate the FP rate of our CAD system. The corresponding FP rates were 0.15, 0.31, and 0.86 FPs/image for cluster-based detection when negative mammograms were used for estimation of FP rates.« less

  11. An analysis of cluster headache information provided on internet websites.

    PubMed

    Peterlin, B Lee; Gambini-Suarez, Eduardo; Lidicker, Jeffrey; Levin, Morris

    2008-03-01

    To evaluate the quality of websites providing cluster headache information for patients and healthcare providers. The Internet has become an increasingly important source of healthcare information. However, limited data exist regarding the quality of websites providing headache information. This was a cross-sectional study conducted in February 2007. Websites providing cluster headache information were determined on the search engine MetaCrawler and classified as either patient oriented or healthcare provider oriented. The overall quality of each site was evaluated using a score system. Readability was evaluated using the Flesch-Kincaid Grade Level Readability Score (FKRS). Website quality was analyzed based on ownership, purpose, authorship, author qualifications, attribution, interactivity, and currency. The technical quality of the cluster headache information was analyzed based on content specific to cluster headache. The final ranking, based on the sum of the ranks of all 3 categories, was determined and then contrasted between the patient-oriented and healthcare professional-oriented websites using 2-sample t-tests. Of the first 40 websites found on MetaCrawler, 72.5% were advertisements, unrelated to headache, or repeated websites. Although the standard US writing averages are at a seventh to eighth grade level, the mean FKRS of all sites was at a 12th grade level of difficulty, with no significant difference between the patient-oriented or healthcare provider-oriented websites (P = .54). Of a total possible 14 points, the overall mean quality component score was 9.9 for all sites; and of a total possible 23 points, the overall mean technical component score was 13.9. There was no significant difference for either the quality or technical component scores between patient-oriented or healthcare provider-oriented websites (P = .45 and P = .80, respectively). There are numerous cluster headache websites that can be found on the Internet. The quality of most of the websites dedicated to cluster headache is mediocre, and although there are some excellent cluster headache websites, these sites may be challenging for many users to locate. There was no significant difference in the overall quality of websites oriented for patients or healthcare providers providing cluster headache information evaluated in this study. In addition, websites providing high-quality cluster headache information are written at an educational level too high for a significant portion of the general population to fully utilize. Physicians should strongly consider providing lists of quality websites on cluster headache for their patients.

  12. Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

    PubMed

    Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

    2017-01-01

    Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.

  13. Differences in Pedaling Technique in Cycling: A Cluster Analysis.

    PubMed

    Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

    2016-10-01

    To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.

  14. Dynamics of intracranial electroencephalographic recordings from epilepsy patients using univariate and bivariate recurrence networks.

    PubMed

    Subramaniyam, Narayan Puthanmadam; Hyttinen, Jari

    2015-02-01

    Recently Andrezejak et al. combined the randomness and nonlinear independence test with iterative amplitude adjusted Fourier transform (iAAFT) surrogates to distinguish between the dynamics of seizure-free intracranial electroencephalographic (EEG) signals recorded from epileptogenic (focal) and nonepileptogenic (nonfocal) brain areas of epileptic patients. However, stationarity is a part of the null hypothesis for iAAFT surrogates and thus nonstationarity can violate the null hypothesis. In this work we first propose the application of the randomness and nonlinear independence test based on recurrence network measures to distinguish between the dynamics of focal and nonfocal EEG signals. Furthermore, we combine these tests with both iAAFT and truncated Fourier transform (TFT) surrogate methods, which also preserves the nonstationarity of the original data in the surrogates along with its linear structure. Our results indicate that focal EEG signals exhibit an increased degree of structural complexity and interdependency compared to nonfocal EEG signals. In general, we find higher rejections for randomness and nonlinear independence tests for focal EEG signals compared to nonfocal EEG signals. In particular, the univariate recurrence network measures, the average clustering coefficient C and assortativity R, and the bivariate recurrence network measure, the average cross-clustering coefficient C(cross), can successfully distinguish between the focal and nonfocal EEG signals, even when the analysis is restricted to nonstationary signals, irrespective of the type of surrogates used. On the other hand, we find that the univariate recurrence network measures, the average path length L, and the average betweenness centrality BC fail to distinguish between the focal and nonfocal EEG signals when iAAFT surrogates are used. However, these two measures can distinguish between focal and nonfocal EEG signals when TFT surrogates are used for nonstationary signals. We also report an improvement in the performance of nonlinear prediction error N and nonlinear interdependence measure L used by Andrezejak et al., when TFT surrogates are used for nonstationary EEG signals. We also find that the outcome of the nonlinear independence test based on the average cross-clustering coefficient C(cross) is independent of the outcome of the randomness test based on the average clustering coefficient C. Thus, the univariate and bivariate recurrence network measures provide independent information regarding the dynamics of the focal and nonfocal EEG signals. In conclusion, recurrence network analysis combined with nonstationary surrogates can be applied to derive reliable biomarkers to distinguish between epileptogenic and nonepileptogenic brain areas using EEG signals.

  15. Dynamics of intracranial electroencephalographic recordings from epilepsy patients using univariate and bivariate recurrence networks

    NASA Astrophysics Data System (ADS)

    Subramaniyam, Narayan Puthanmadam; Hyttinen, Jari

    2015-02-01

    Recently Andrezejak et al. combined the randomness and nonlinear independence test with iterative amplitude adjusted Fourier transform (iAAFT) surrogates to distinguish between the dynamics of seizure-free intracranial electroencephalographic (EEG) signals recorded from epileptogenic (focal) and nonepileptogenic (nonfocal) brain areas of epileptic patients. However, stationarity is a part of the null hypothesis for iAAFT surrogates and thus nonstationarity can violate the null hypothesis. In this work we first propose the application of the randomness and nonlinear independence test based on recurrence network measures to distinguish between the dynamics of focal and nonfocal EEG signals. Furthermore, we combine these tests with both iAAFT and truncated Fourier transform (TFT) surrogate methods, which also preserves the nonstationarity of the original data in the surrogates along with its linear structure. Our results indicate that focal EEG signals exhibit an increased degree of structural complexity and interdependency compared to nonfocal EEG signals. In general, we find higher rejections for randomness and nonlinear independence tests for focal EEG signals compared to nonfocal EEG signals. In particular, the univariate recurrence network measures, the average clustering coefficient C and assortativity R , and the bivariate recurrence network measure, the average cross-clustering coefficient Ccross, can successfully distinguish between the focal and nonfocal EEG signals, even when the analysis is restricted to nonstationary signals, irrespective of the type of surrogates used. On the other hand, we find that the univariate recurrence network measures, the average path length L , and the average betweenness centrality BC fail to distinguish between the focal and nonfocal EEG signals when iAAFT surrogates are used. However, these two measures can distinguish between focal and nonfocal EEG signals when TFT surrogates are used for nonstationary signals. We also report an improvement in the performance of nonlinear prediction error N and nonlinear interdependence measure L used by Andrezejak et al., when TFT surrogates are used for nonstationary EEG signals. We also find that the outcome of the nonlinear independence test based on the average cross-clustering coefficient Ccross is independent of the outcome of the randomness test based on the average clustering coefficient C . Thus, the univariate and bivariate recurrence network measures provide independent information regarding the dynamics of the focal and nonfocal EEG signals. In conclusion, recurrence network analysis combined with nonstationary surrogates can be applied to derive reliable biomarkers to distinguish between epileptogenic and nonepileptogenic brain areas using EEG signals.

  16. A Novel 3D Label-Free Monitoring System of hES-Derived Cardiomyocyte Clusters: A Step Forward to In Vitro Cardiotoxicity Testing

    PubMed Central

    Jahnke, Heinz-Georg; Steel, Daniella; Fleischer, Stephan; Seidel, Diana; Kurz, Randy; Vinz, Silvia; Dahlenborg, Kerstin; Sartipy, Peter; Robitzki, Andrea A.

    2013-01-01

    Unexpected adverse effects on the cardiovascular system remain a major challenge in the development of novel active pharmaceutical ingredients (API). To overcome the current limitations of animal-based in vitro and in vivo test systems, stem cell derived human cardiomyocyte clusters (hCMC) offer the opportunity for highly predictable pre-clinical testing. The three-dimensional structure of hCMC appears more representative of tissue milieu than traditional monolayer cell culture. However, there is a lack of long-term, real time monitoring systems for tissue-like cardiac material. To address this issue, we have developed a microcavity array (MCA)-based label-free monitoring system that eliminates the need for critical hCMC adhesion and outgrowth steps. In contrast, feasible field potential derived action potential recording is possible immediately after positioning within the microcavity. Moreover, this approach allows extended observation of adverse effects on hCMC. For the first time, we describe herein the monitoring of hCMC over 35 days while preserving the hCMC structure and electrophysiological characteristics. Furthermore, we demonstrated the sensitive detection and quantification of adverse API effects using E4031, doxorubicin, and noradrenaline directly on unaltered 3D cultures. The MCA system provides multi-parameter analysis capabilities incorporating field potential recording, impedance spectroscopy, and optical read-outs on individual clusters giving a comprehensive insight into induced cellular alterations within a complex cardiac culture over days or even weeks. PMID:23861955

  17. Evaluation of genetic diversity in Chinese kale (Brassica oleracea L. var. alboglabra Bailey) by using rapid amplified polymorphic DNA and sequence-related amplified polymorphism markers.

    PubMed

    Zhang, J; Zhang, L G

    2014-02-14

    Chinese kale is an original Chinese vegetable of the Cruciferae family. To select suitable parents for hybrid breeding, we thoroughly analyzed the genetic diversity of Chinese kale. Random amplified polymorphic DNA (RAPD) and sequence-related amplified polymorphism (SRAP) molecular markers were used to evaluate the genetic diversity across 21 Chinese kale accessions from AVRDC and Guangzhou in China. A total of 104 bands were detected by 11 RAPD primers, of which 66 (63.5%) were polymorphic, and 229 polymorphic bands (68.4%) were observed in 335 bands amplified by 17 SRAP primer combinations. The dendrogram showed the grouping of the 21 accessions into 4 main clusters based on RAPD data, and into 6 clusters based on SRAP and combined data (RAPD + SRAP). The clustering of accessions based on SRAP data was consistent with petal colors. The Mantel test indicated a poor fit for the RAPD and SRAP data (r = 0.16). These results have an important implication for Chinese kale germplasm characterization and improvement.

  18. Molecular Typing of Mycobacterium Tuberculosis Complex by 24-Locus Based MIRU-VNTR Typing in Conjunction with Spoligotyping to Assess Genetic Diversity of Strains Circulating in Morocco

    PubMed Central

    Bouklata, Nada; Supply, Philip; Jaouhari, Sanae; Charof, Reda; Seghrouchni, Fouad; Sadki, Khalid; El Achhab, Youness; Nejjari, Chakib; Filali-Maltouf, Abdelkarim

    2015-01-01

    Background Standard 24-locus Mycobacterial Interspersed Repetitive Unit Variable Number Tandem Repeat (MIRU-VNTR) typing allows to get an improved resolution power for tracing TB transmission and predicting different strain (sub) lineages in a community. Methodology During 2010–2012, a total of 168 Mycobacterium tuberculosis Complex (MTBC) isolates were collected by cluster sampling from 10 different Moroccan cities, and centralized by the National Reference Laboratory of Tuberculosis over the study period. All isolates were genotyped using spoligotyping, and a subset of 75 was genotyped using 24-locus based MIRU-VNTR typing, followed by first line drug susceptibility testing. Corresponding strain lineages were predicted using MIRU-VNTRplus database. Principal Findings Spoligotyping resulted in 137 isolates in 18 clusters (2–50 isolates per cluster: clustering rate of 81.54%) corresponding to a SIT number in the SITVIT database, while 31(18.45%) patterns were unique of which 10 were labelled as “unknown” according to the same database. The most prevalent spoligotype family was LAM; (n = 81 or 48.24% of isolates, dominated by SIT42, n = 49), followed by Haarlem (23.80%), T superfamily (15.47%), >Beijing (2.97%), > U clade (2.38%) and S clade (1.19%). Subsequent 24-Locus MIRU-VNTR typing identified 64 unique types and 11 isolates in 5 clusters (2 to 3isolates per cluster), substantially reducing clusters defined by spoligotyping only. The single cluster of three isolates corresponded to two previously treated MDR-TB cases and one new MDR-TB case known to be contact a same index case and belonging to a same family, albeit residing in 3 different administrative regions. MIRU-VNTR loci 4052, 802, 2996, 2163b, 3690, 1955, 424, 2531, 2401 and 960 were highly discriminative in our setting (HGDI >0.6). Conclusions 24-locus MIRU-VNTR typing can substantially improve the resolution of large clusters initially defined by spoligotyping alone and predominating in Morocco, and could therefore be used to better study tuberculosis transmission in a population-based, multi-year sample context. PMID:26285026

  19. Identification and DUS Testing of Rice Varieties through Microsatellite Markers

    PubMed Central

    Pourabed, Ehsan; Jazayeri Noushabadi, Mohammad Reza; Jamali, Seyed Hossein; Moheb Alipour, Naser; Zareyan, Abbas; Sadeghi, Leila

    2015-01-01

    Identification and registration of new rice varieties are very important to be free from environmental effects and using molecular markers that are more reliable. The objectives of this study were, first, the identification and distinction of 40 rice varieties consisting of local varieties of Iran, improved varieties, and IRRI varieties using PIC, and discriminating power, second, cluster analysis based on Dice similarity coefficient and UPGMA algorithm, and, third, determining the ability of microsatellite markers to separate varieties utilizing the best combination of markers. For this research, 12 microsatellite markers were used. In total, 83 polymorphic alleles (6.91 alleles per locus) were found. In addition, the variation of PIC was calculated from 0.52 to 0.9. The results of cluster analysis showed the complete discrimination of varieties from each other except for IR58025A and IR58025B. Moreover, cluster analysis could detect the most of the improved varieties from local varieties. Based on the best combination of markers analysis, five pair primers together have shown the same results of all markers for detection among all varieties. Considering the results of this research, we can propose that microsatellite markers can be used as a complementary tool for morphological characteristics in DUS tests. PMID:25755666

  20. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  1. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  2. Chemotaxonomy of heterocystous cyanobacteria using FAME profiling as species markers.

    PubMed

    Shukla, Ekta; Singh, Satya Shila; Singh, Prashant; Mishra, Arun Kumar

    2012-07-01

    The fatty acid methyl ester (FAME) analysis of the 12 heterocystous cyanobacterial strains showed different fatty acid profiling based on the presence/absence and the percentage of 13 different types of fatty acids. The major fatty acids viz. palmitic acid (16:0), hexadecadienoic acid (16:2), stearic acid (18:0), oleic acid (18:1), linoleic (18:2), and linolenic acid (18:3) were present among all the strains except Cylindrospermum musicola where oleic acid (18:1) was absent. All the strains showed high levels of polyunsaturated fatty acid (PUFAs; 41-68.35%) followed by saturated fatty acid (SAFAs; 1.82-40.66%) and monounsaturated fatty acid (0.85-24.98%). Highest percentage of PUFAs and essential fatty acid (linolenic acid; 18:3) was reported in Scytonema bohnerii which can be used as fatty acid supplement in medical and biotechnological purpose. The cluster analysis based on FAME profiling suggests the presence of two distinct clusters with Euclidean distance ranging from 0 to 25. S. bohnerii of cluster I was distantly related to the other strains of cluster II. The genotypes of cluster II were further divided into two subclusters, i.e., IIa with C. musicola showing great divergence with the other genotypes of IIb which was further subdivided into two groups. Subsubcluster IIb(1) was represented by a genotype, Anabaena sp. whereas subsubcluster IIb(2) was distinguished by two groups, i.e., one group having significant similarity among their three genotypes showed distant relation with the other group having closely related six genotypes. To test the validity of the fatty acid profiles as a marker, cluster analysis has also been generated on the basis of morphological attributes. Our results suggest that FAME profiling might be used as species markers in the study of polyphasic approach based taxonomy and phylogenetic relationship.

  3. Task Demand Influences Relationships Among Sex, Clustering Strategy, and Recall: 16-Word Versus 9-Word List Learning Tests

    PubMed Central

    Sunderaraman, Preeti; Blumen, Helena M.; DeMatteo, David; Apa, Zoltan; Cosentino, Stephanie

    2013-01-01

    Objective We compared the relationships among sex, clustering strategy, and recall across different task demands using the 16-word California Verbal Learning Test–Second Edition (CVLT-II) and the 9-word Philadelphia (repeatable) Verbal Learning Test (PrVLT). Background Women generally score higher than men on verbal memory tasks, possibly because women tend to use semantic clustering. This sex difference has been established via word-list learning tests such as the CVLT-II. Methods In a retrospective between-group study, we compared how 2 separate groups of cognitively healthy older adults performed on a longer and a shorter verbal learning test. The group completing the CVLT-II had 36 women and 26 men; the group completing the PrVLT had 27 women and 21 men. Results Overall, multiple regression analyses revealed that semantic clustering was significantly associated with total recall on both tests’ lists (P < 0.001). Sex differences in recall and semantic clustering diminished with the shorter PrVLT word list. Conclusions Semantic clustering uniquely influenced recall on both the longer and shorter word lists. However, serial clustering and sex influenced recall depending on the length of the word list (ie, the task demand). These findings suggest a complex nonlinear relationship among verbal memory, clustering strategies, and task demand. PMID:23812171

  4. New Asteroseismic Scaling Relations Based on the Hayashi Track Relation Applied to Red Giant Branch Stars in NGC 6791 and NGC 6819

    NASA Astrophysics Data System (ADS)

    Wu, T.; Li, Y.; Hekker, S.

    2014-01-01

    Stellar mass M, radius R, and gravity g are important basic parameters in stellar physics. Accurate values for these parameters can be obtained from the gravitational interaction between stars in multiple systems or from asteroseismology. Stars in a cluster are thought to be formed coevally from the same interstellar cloud of gas and dust. The cluster members are therefore expected to have some properties in common. These common properties strengthen our ability to constrain stellar models and asteroseismically derived M, R, and g when tested against an ensemble of cluster stars. Here we derive new scaling relations based on a relation for stars on the Hayashi track (\\sqrt{T_eff} \\sim g^pR^q) to determine the masses and metallicities of red giant branch stars in open clusters NGC 6791 and NGC 6819 from the global oscillation parameters Δν (the large frequency separation) and νmax (frequency of maximum oscillation power). The Δν and νmax values are derived from Kepler observations. From the analysis of these new relations we derive: (1) direct observational evidence that the masses of red giant branch stars in a cluster are the same within their uncertainties, (2) new methods to derive M and z of the cluster in a self-consistent way from Δν and νmax, with lower intrinsic uncertainties, and (3) the mass dependence in the Δν - νmax relation for red giant branch stars.

  5. Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

    PubMed

    Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

    Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  6. Parcellation of left parietal tool representations by functional connectivity

    PubMed Central

    Garcea, Frank E.; Z. Mahon, Bradford

    2014-01-01

    Manipulating a tool according to its function requires the integration of visual, conceptual, and motor information, a process subserved in part by left parietal cortex. How these different types of information are integrated and how their integration is reflected in neural responses in the parietal lobule remains an open question. Here, participants viewed images of tools and animals during functional magnetic resonance imaging (fMRI). K-means clustering over time series data was used to parcellate left parietal cortex into subregions based on functional connectivity to a whole brain network of regions involved in tool processing. One cluster, in the inferior parietal cortex, expressed privileged functional connectivity to the left ventral premotor cortex. A second cluster, in the vicinity of the anterior intraparietal sulcus, expressed privileged functional connectivity with the left medial fusiform gyrus. A third cluster in the superior parietal lobe expressed privileged functional connectivity with dorsal occipital cortex. Control analyses using Monte Carlo style permutation tests demonstrated that the clustering solutions were outside the range of what would be observed based on chance ‘lumpiness’ in random data, or mere anatomical proximity. Finally, hierarchical clustering analyses were used to formally relate the resulting parcellation scheme of left parietal tool representations to previous work that has parcellated the left parietal lobule on purely anatomical grounds. These findings demonstrate significant heterogeneity in the functional organization of manipulable object representations in left parietal cortex, and outline a framework that generates novel predictions about the causes of some forms of upper limb apraxia. PMID:24892224

  7. Associations between Functional Milestones and Psychiatric Admissions in an Urban Area: Utility of a Cluster-Analytical Approach.

    PubMed

    Montemagni, Cristiana; Frieri, Tiziana; Villari, Vincenzo; Rocca, Paola

    2018-06-01

    The purpose of the study was to identify homogenous subgroups, based upon achievement of two functional milestones (marriage and employment) and Global Assessment of Functioning (GAF) score in a sample of 848 acute patients admitted to the Psychiatric Emergency Service (PES) of the Città della Salute e della Scienza di Torino, during a 24-months period. A two-step cluster-analysis, using GAF total score and the achievements in the two milestones as input data was performed. In order to examine whether the identified subgroups differed in external variables that were not included in the clustering process, and consequently to validate the found functional profiles, chi-square tests for categorical variables and analyses of variance (ANOVA) for continuous variables were performed. Five clusters were found. Employed patients (Clusters 4 and 5) had more years of education, less illness chronicity (shorter duration of illness and lower proportion of previous voluntary hospitalizations), lower use of mental health resources in the last year yet higher treatment adherence, larger network size, and higher ordinary discharge. Married inpatients (Clusters 3 and 5) had lower frequencies of substance abuse. The remarkably high rate of unemployment in this inpatients' sample, and the evidence of associations between unemployment and poorer functioning, argue for further research and development of evidence-based supported employment programs, that put forth diligent effort in helping people obtain work quickly and sustain; they may also help to reduce health care service use among that clientele.

  8. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

    PubMed Central

    Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

    2010-01-01

    Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420

  9. Characteristics of HIV-infected U.S. Army soldiers linked in molecular transmission clusters, 2001-2012

    PubMed Central

    Jagodzinski, Linda L.; Liu, Ying; Pham, Peter T.; Kijak, Gustavo H.; Tovanabutra, Sodsai; McCutchan, Francine E.; Scoville, Stephanie L.; Cersovsky, Steven B.; Michael, Nelson L.; Scott, Paul T.; Peel, Sheila A.

    2017-01-01

    Objective Recent surveillance data suggests the United States (U.S.) Army HIV epidemic is concentrated among men who have sex with men. To identify potential targets for HIV prevention strategies, the relationship between demographic and clinical factors and membership within transmission clusters based on baseline pol sequences of HIV-infected Soldiers from 2001 through 2012 were analyzed. Methods We conducted a retrospective analysis of baseline partial pol sequences, demographic and clinical characteristics available for all Soldiers in active service and newly-diagnosed with HIV-1 infection from January 1, 2001 through December 31, 2012. HIV-1 subtype designations and transmission clusters were identified from phylogenetic analysis of sequences. Univariate and multivariate logistic regression models were used to evaluate and adjust for the association between characteristics and cluster membership. Results Among 518 of 995 HIV-infected Soldiers with available partial pol sequences, 29% were members of a transmission cluster. Assignment to a southern U.S. region at diagnosis and year of diagnosis were independently associated with cluster membership after adjustment for other significant characteristics (p<0.10) of age, race, year of diagnosis, region of duty assignment, sexually transmitted infections, last negative HIV test, antiretroviral therapy, and transmitted drug resistance. Subtyping of the pol fragment indicated HIV-1 subtype B infection predominated (94%) among HIV-infected Soldiers. Conclusion These findings identify areas to explore as HIV prevention targets in the U.S. Army. An increased frequency of current force testing may be justified, especially among Soldiers assigned to duty in installations with high local HIV prevalence such as southern U.S. states. PMID:28759645

  10. Kinematics and dynamics of the MKW/AWM poor clusters

    NASA Technical Reports Server (NTRS)

    Beers, Timothy C.; Kriessler, Jeffrey R.; Bird, Christina M.; Huchra, John P.

    1995-01-01

    We report 472 new redshifts for 416 galaxies in the regions of the 23 poor clusters of galaxies originally identified by Morgan, Kayser, and White (MKW), and Albert, White, and Morgan (AWM). Eighteen of the poor clusters now have 10 or more available redshifts within 1.5/h Mpc of the central galaxy; 11 clusters have at least 20 available redshifts. Based on the 21 clusters for which we have sufficient velocity information, the median velocity scale is 336 km/s, a factor of 2 smaller than found for rich clusters. Several of the poor clusters exhibit complex velocity distributions due to the presence of nearby clumps of galaxies. We check on the velocity of the dominant galaxy in each poor cluster relative to the remaining cluster members. Significantly high relative velocities of the dominant galaxy are found in only 4 of 21 poor clusters, 3 of which we suspect are due to contamination of the parent velocity distribution. Several statistical tests indicate that the D/cD galaxies are at the kinematic centers of the parent poor cluster velocity distributions. Mass-to-light ratios for 13 of the 15 poor clusters for which we have the required data are in the range 50 less than or = M/L(sub B(0)) less than or = 200 solar mass/solar luminosity. The complex nature of the regions surrounding many of the poor clusters suggests that these groupings may represent an early epoch of cluster formation. For example, the poor clusters MKW7 and MKWS are shown to be gravitationally bound and likely to merge to form a richer cluster within the next several Gyrs. Eight of the nine other poor clusters for which simple two-body dynamical models can be carried out are consistent with being bound to other clumps in their vicinity. Additional complex systems with more than two gravitationally bound clumps are observed among the poor clusters.

  11. Mapping of terrain by computer clustering techniques using multispectral scanner data and using color aerial film

    NASA Technical Reports Server (NTRS)

    Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.

    1972-01-01

    Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.

  12. Dietary Patterns Among Overweight and Obese African-American Women Living in the Rural South.

    PubMed

    Sterling, Samara; Judd, Suzanne; Bertrand, Brenda; Carson, Tiffany L; Chandler-Laney, Paula; Baskin, Monica L

    2018-02-01

    Obesity and chronic diseases disproportionately affect African-American women in the rural South (US) and may be influenced by adherence to a typical Southern-style diet. There is a need to examine dietary patterns of this population and to determine if consumption of nutritionally rich foods like nuts is associated with consumption of other nutritious foods. The objectives of this study were to identify (1) dietary patterns of overweight/obese African-American women in the rural South; (2) the role that nuts play in the diet; (3) and adherence to federal food group recommendations across dietary patterns. Secondary data analysis of two baseline 24-h dietary recalls was performed on 383 overweight/obese African-American women enrolled in a weight loss intervention in Alabama and Mississippi between 2011 and 2013. Cluster analysis identified dietary patterns. t tests and chi-square tests tested demographic and dietary differences across clusters. The proportion of women in each cluster who met federal recommendations for fruit, vegetable, nuts, added sugar, and sodium intake was calculated. Two dietary patterns were found. Nut intake frequency was higher in cluster 2 (P < .001), which was characterized by a higher intake frequency of fruits and vegetables, but high mean daily intake of added sugar (12.26 ± 7.67 tsp) and sodium (2800 ± 881 mg). Ninety-two percent of participants in this cluster consumed red/processed meats daily. Even among women in this population who consume a more plant-based dietary pattern containing nuts, there is still a need to decrease intake of added sugar, sodium, and red meat.

  13. Fast structure similarity searches among protein models: efficient clustering of protein fragments

    PubMed Central

    2012-01-01

    Background For many predictive applications a large number of models is generated and later clustered in subsets based on structure similarity. In most clustering algorithms an all-vs-all root mean square deviation (RMSD) comparison is performed. Most of the time is typically spent on comparison of non-similar structures. For sets with more than, say, 10,000 models this procedure is very time-consuming and alternative faster algorithms, restricting comparisons only to most similar structures would be useful. Results We exploit the inverse triangle inequality on the RMSD between two structures given the RMSDs with a third structure. The lower bound on RMSD may be used, when restricting the search of similarity to a reasonably low RMSD threshold value, to speed up similarity searches significantly. Tests are performed on large sets of decoys which are widely used as test cases for predictive methods, with a speed-up of up to 100 times with respect to all-vs-all comparison depending on the set and parameters used. Sample applications are shown. Conclusions The algorithm presented here allows fast comparison of large data sets of structures with limited memory requirements. As an example of application we present clustering of more than 100000 fragments of length 5 from the top500H dataset into few hundred representative fragments. A more realistic scenario is provided by the search of similarity within the very large decoy sets used for the tests. Other applications regard filtering nearly-indentical conformation in selected CASP9 datasets and clustering molecular dynamics snapshots. Availability A linux executable and a Perl script with examples are given in the supplementary material (Additional file 1). The source code is available upon request from the authors. PMID:22642815

  14. Exploring relationships between Dairy Herd Improvement monitors of performance and the Transition Cow Index in Wisconsin dairy herds.

    PubMed

    Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B

    2016-09-01

    Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. The effect of AGN feedback on the X-ray morphologies of clusters: Simulations vs. observations

    NASA Astrophysics Data System (ADS)

    Chon, Gayoung; Puchwein, Ewald; Böhringer, Hans

    2016-07-01

    Clusters of galaxies probe the large-scale distribution of matter and are a useful tool to test the cosmological models by constraining cosmic structure growth and the expansion of the Universe. It is the scaling relations between mass observables and the true mass of a cluster through which we obtain the cosmological constraints by comparing to theoretical cluster mass functions. These scaling relations are, however, heavily influenced by cluster morphology. The presence of the slight tension in recent cosmological constraints on Ωm and σ8 based on the CMB and clusters has boosted the interests in looking for possible sources for the discrepancy. Therefore we study here the effect of active galactic nucleus (AGN) feedback as one of the major mechanisms modifying the cluster morphology influencing scaling relations. It is known that AGN feedback injects energies up to 1062 erg into the intracluster medium, controls the heating and cooling of a cluster, and re-distributes cold gas from the centre to outer radii. We have also learned that cluster simulations with AGN feedback can reproduce observed cluster properties, for example, the X-ray luminosity, temperature, and cooling rate at the centre better than without the AGN feedback. In this paper using cosmological hydrodynamical simulations we investigate how the AGN feedback changes the X-ray morphology of the simulated systems, and compare this to the observed Representative XMM-Newton Cluster Structure Survey (REXCESS) clusters. We apply two substructure measures, centre shifts (w) and power ratios (e.g. P3/P0), to characterise the cluster morphology, and find that our simulated clusters are more substructured than the observed clusters based on the values of w and P3/P0. We also show that the degree of this discrepancy is affected by the inclusion of AGN feedback. While the clusters simulated with the AGN feedback are in much better agreement with the REXCESS LX-T relation, they are also more substructured, which increases the tension with observations. When classified as non-relaxed or relaxed according to their w and P3/P0 values, we find that there are no relaxed clusters in the simulations with the AGN feedback. This suggests that not only global cluster properties, like LX and T, and radial profiles should be used to compare and to calibrate simulations with observations, but also substructure measures like centre shifts and power ratios. Finally, we discuss what changes in the simulations might ease the tension with observational constraints on these quantities.

  16. Crowded Cluster Cores. Algorithms for Deblending in Dark Energy Survey Images

    DOE PAGES

    Zhang, Yuanyuan; McKay, Timothy A.; Bertin, Emmanuel; ...

    2015-10-26

    Deep optical images are often crowded with overlapping objects. We found that this is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. In this article, we introduce a new software tool called the Gradient And Interpolation based (GAIN) deblender. GAIN is used as a secondary deblender to improve the separation of overlapping objects in galaxy cluster cores inmore » Dark Energy Survey images. It uses image intensity gradients and an interpolation technique originally developed to correct flawed digital images. Our paper is dedicated to describing the algorithm of the GAIN deblender and its applications, but we additionally include modest tests of the software based on real Dark Energy Survey co-add images. GAIN helps to extract an unbiased photometry measurement for blended sources and improve detection completeness, while introducing few spurious detections. When applied to processed Dark Energy Survey data, GAIN serves as a useful quick fix when a high level of deblending is desired.« less

  17. Non-negative Matrix Factorization and Co-clustering: A Promising Tool for Multi-tasks Bearing Fault Diagnosis

    NASA Astrophysics Data System (ADS)

    Shen, Fei; Chen, Chao; Yan, Ruqiang

    2017-05-01

    Classical bearing fault diagnosis methods, being designed according to one specific task, always pay attention to the effectiveness of extracted features and the final diagnostic performance. However, most of these approaches suffer from inefficiency when multiple tasks exist, especially in a real-time diagnostic scenario. A fault diagnosis method based on Non-negative Matrix Factorization (NMF) and Co-clustering strategy is proposed to overcome this limitation. Firstly, some high-dimensional matrixes are constructed using the Short-Time Fourier Transform (STFT) features, where the dimension of each matrix equals to the number of target tasks. Then, the NMF algorithm is carried out to obtain different components in each dimension direction through optimized matching, such as Euclidean distance and divergence distance. Finally, a Co-clustering technique based on information entropy is utilized to realize classification of each component. To verity the effectiveness of the proposed approach, a series of bearing data sets were analysed in this research. The tests indicated that although the diagnostic performance of single task is comparable to traditional clustering methods such as K-mean algorithm and Guassian Mixture Model, the accuracy and computational efficiency in multi-tasks fault diagnosis are improved.

  18. Processing ARM VAP data on an AWS cluster

    NASA Astrophysics Data System (ADS)

    Martin, T.; Macduff, M.; Shippert, T.

    2017-12-01

    The Atmospheric Radiation Measurement (ARM) Data Management Facility (DMF) manages over 18,000 processes and 1.3 TB of data each day. This includes many Value Added Products (VAPs) that make use of multiple instruments to produce the derived products that are scientifically relevant. A thermodynamic and cloud profile VAP is being developed to provide input to the ARM Large-eddy simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) project (https://www.arm.gov/capabilities/vaps/lasso-122) . This algorithm is CPU intensive and the processing requirements exceeded the available DMF computing capacity. Amazon Web Service (AWS) along with CfnCluster was investigated to see how it would perform. This cluster environment is cost effective and scales dynamically based on demand. We were able to take advantage of autoscaling which allowed the cluster to grow and shrink based on the size of the processing queue. We also were able to take advantage of the Amazon Web Services spot market to further reduce the cost. Our test was very successful and found that cloud resources can be used to efficiently and effectively process time series data. This poster will present the resources and methodology used to successfully run the algorithm.

  19. Genetic effect of interleukin-1 beta (C-511T) polymorphism on the structural covariance network and white matter integrity in Alzheimer's disease.

    PubMed

    Huang, Chi-Wei; Hsu, Shih-Wei; Tsai, Shih-Jen; Chen, Nai-Ching; Liu, Mu-En; Lee, Chen-Chang; Huang, Shu-Hua; Chang, Weng-Neng; Chang, Ya-Ting; Tsai, Wan-Chen; Chang, Chiung-Chih

    2017-01-18

    Inflammatory processes play a pivotal role in the degenerative process of Alzheimer's disease. In humans, a biallelic (C/T) polymorphism in the promoter region (position-511) (rs16944) of the interleukin-1 beta gene has been significantly associated with differences in the secretory capacity of interleukin-1 beta. In this study, we investigated whether this functional polymorphism mediates the brain networks in patients with Alzheimer's disease. We enrolled a total of 135 patients with Alzheimer's disease (65 males, 70 females), and investigated their gray matter structural covariance networks using 3D T1 magnetic resonance imaging and their white matter macro-structural integrities using fractional anisotropy. The patients were classified into two genotype groups: C-carriers (n = 108) and TT-carriers (n = 27), and the structural covariance networks were constructed using seed-based analysis focusing on the default mode network medial temporal or dorsal medial subsystem, salience network and executive control network. Neurobehavioral scores were used as the major outcome factors for clinical correlations. There were no differences between the two genotype groups in the cognitive test scores, seed, or peak cluster volumes and white matter fractional anisotropy. The covariance strength showing C-carriers > TT-carriers was the entorhinal-cingulum axis. There were two peak clusters (Brodmann 6 and 10) in the salience network and four peak clusters (superior prefrontal, precentral, fusiform, and temporal) in the executive control network that showed C-carriers < TT-carriers in covariance strength. The salience network and executive control network peak clusters in the TT group and the default mode network peak clusters in the C-carriers strongly predicted the cognitive test scores. Interleukin-1 beta C-511 T polymorphism modulates the structural covariance strength on the anterior brain network and entorhinal-interconnected network which were independent of the white matter tract integrity. Depending on the specific C-511 T genotype, different network clusters could predict the cognitive tests.

  20. Development of Competency-Based Vocational Agricultural Instructional Materials for Handicapped Students Enrolled in Regular Agriculture Programs Other Than Horticulture. Final Report.

    ERIC Educational Resources Information Center

    Baggett, Connie D.; And Others

    This report includes a description of a project to develop and field-test competency-based instructional materials for handicapped students enrolled in regular vocational agriculture programs; a list of project advisory personnel; the clusters of skills identified as appropriate for handicapped students enrolled in courses in dairy production,…

  1. The Principle of the Micro-Electronic Neural Bridge and a Prototype System Design.

    PubMed

    Huang, Zong-Hao; Wang, Zhi-Gong; Lu, Xiao-Ying; Li, Wen-Yuan; Zhou, Yu-Xuan; Shen, Xiao-Yan; Zhao, Xin-Tai

    2016-01-01

    The micro-electronic neural bridge (MENB) aims to rebuild lost motor function of paralyzed humans by routing movement-related signals from the brain, around the damage part in the spinal cord, to the external effectors. This study focused on the prototype system design of the MENB, including the principle of the MENB, the neural signal detecting circuit and the functional electrical stimulation (FES) circuit design, and the spike detecting and sorting algorithm. In this study, we developed a novel improved amplitude threshold spike detecting method based on variable forward difference threshold for both training and bridging phase. The discrete wavelet transform (DWT), a new level feature coefficient selection method based on Lilliefors test, and the k-means clustering method based on Mahalanobis distance were used for spike sorting. A real-time online spike detecting and sorting algorithm based on DWT and Euclidean distance was also implemented for the bridging phase. Tested by the data sets available at Caltech, in the training phase, the average sensitivity, specificity, and clustering accuracies are 99.43%, 97.83%, and 95.45%, respectively. Validated by the three-fold cross-validation method, the average sensitivity, specificity, and classification accuracy are 99.43%, 97.70%, and 96.46%, respectively.

  2. An RR Lyrae period shift in terms of the Fourier parameter Phi sub 31

    NASA Technical Reports Server (NTRS)

    Clement, Christine M.; Jankulak, Michael; Simon, Norman R.

    1992-01-01

    The Fourier phase parameter Phi sub 31 has been determined for RRc stars in five globular clusters, NGC 6171, M5, M3, M53, and M15. The results indicate that the RRc stars in a given cluster show a sequence of Phi sub 31 increasing with period, and that the higher the cluster metallicity, the higher the sequence lies in a plot of Phi sub 31 with period. The Phi sub 31 values for the stars in NGC 6171 and M5 presented here are based on observations made with the University of Toronto 0.61 m telescope at Las Campanas, Chile, while those for M3, M53, and M15 are based on published data. A bootstrap procedure has been used to establish the uncertainties in the Fourier parameters. The physical significance of the relationship among Phi sub 31, period, and metallicity is not yet understood. It will need to be tested with hydrodynamic pulsation models computed with new opacities.

  3. Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.

    PubMed

    You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary

    2011-02-01

    The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.

  4. A comparative study of DIGNET, average, complete, single hierarchical and k-means clustering algorithms in 2D face image recognition

    NASA Astrophysics Data System (ADS)

    Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.

    2014-06-01

    The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.

  5. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System.

    PubMed

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).

  6. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System

    PubMed Central

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP). PMID:26829639

  7. Cluster and Sporadic Cases of Herbaspirillum Species Infections in Patients With Cancer

    PubMed Central

    Chemaly, Roy F.; Dantes, Raymund; Shah, Dimpy P.; Shah, Pankil K.; Pascoe, Neil; Ariza-Heredia, Ella; Perego, Cheryl; Nguyen, Duc B.; Nguyen, Kim; Modarai, Farhad; Moulton-Meissner, Heather; Noble-Wang, Judith; Tarrand, Jeffrey J.; LiPuma, John J.; Guh, Alice Y.; MacCannell, Tara; Raad, Issam; Mulanovich, Victor

    2015-01-01

    Background. Herbaspirillum species are gram-negative Betaproteobacteria that inhabit the rhizosphere. We investigated a potential cluster of hospital-based Herbaspirillum species infections. Methods. Cases were defined as Herbaspirillum species isolated from a patient in our comprehensive cancer center between 1 January 2006 and 15 October 2013. Case finding was performed by reviewing isolates initially identified as Burkholderia cepacia susceptible to all antibiotics tested, and 16S ribosomal DNA sequencing of available isolates to confirm their identity. Pulsed-field gel electrophoresis (PFGE) was performed to test genetic relatedness. Facility observations, infection prevention assessments, and environmental sampling were performed to investigate potential sources of Herbaspirillum species. Results. Eight cases of Herbaspirillum species were identified. Isolates from the first 5 clustered cases were initially misidentified as B. cepacia, and available isolates from 4 of these cases were indistinguishable. The 3 subsequent cases were identified by prospective surveillance and had different PFGE patterns. All but 1 case-patient had bloodstream infections, and 6 presented with sepsis. Underlying diagnoses included solid tumors (3), leukemia (3), lymphoma (1), and aplastic anemia (1). Herbaspirillum species infections were hospital-onset in 5 patients and community-onset in 3. All symptomatic patients were treated with intravenous antibiotics, and their infections resolved. No environmental source or common mechanism of acquisition was identified. Conclusions. This is the first report of a hospital-based cluster of Herbaspirillum species infections. Herbaspirillum species are capable of causing bacteremia and sepsis in immunocompromised patients. Herbaspirillum species can be misidentified as Burkholderia cepacia by commercially available microbial identification systems. PMID:25216687

  8. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  9. Combining self-organizing mapping and supervised affinity propagation clustering approach to investigate functional brain networks involved in motor imagery and execution with fMRI measurements.

    PubMed

    Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin

    2015-01-01

    Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.

  10. Magnesium ferrite nanocrystal clusters for magnetorheological fluid with enhanced sedimentation stability

    NASA Astrophysics Data System (ADS)

    Wang, Guangshuo; Ma, Yingying; Li, Meixia; Cui, Guohua; Che, Hongwei; Mu, Jingbo; Zhang, Xiaoliang; Tong, Yu; Dong, Xufeng

    2017-01-01

    In this study, magnesium ferrite (MgFe2O4) nanocrystal clusters were synthesized using an ascorbic acid-assistant solvothermal method and evaluated as a candidate for magnetorheological (MR) fluid. The morphology, microstructure and magnetic properties of the MgFe2O4 nanocrystal clusters were investigated in detail by field emission scanning electron microscopy (FESEM), transmission electron microscope (TEM), thermogravimetric analyzer (TGA), X-ray diffraction (XRD) and superconducting quantum interference device (SQUID). The MgFe2O4 nanocrystal clusters were suspended in silicone oil to prepare MR fluid and the MR properties were tested using a Physica MCR301 rheometer fitted with a magneto-rheological module. The prepared MR fluid showed typical Bingham plastic behavior, changing from a liquid-like to a solid-like structure under an external magnetic field. Compared with the conventional carbonyl iron particles, MgFe2O4 nanocrystal clusters-based MR fluid demonstrated enhanced sedimentation stability due to the reduced mismatch in density between the particles and the carrier medium. In summary, the as-prepared MgFe2O4 nanocrystal clusters are regarded as a promising candidate for MR fluid with enhanced sedimentation stability.

  11. A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

    PubMed

    Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe

    2018-02-01

    In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. X-ray spectral observations of clusters of galaxies undergoing merger events

    NASA Astrophysics Data System (ADS)

    Henriksen, Mark J.

    1993-09-01

    We have analyzed the HEAO 1 A2 observations of two clusters whose optical and X-ray isophotes are suggestive of merging subclusters, A119 and A754, and find evidence of nonisothermal X-ray emission from both clusters. The X-ray spectrum of both clusters, when fitted with a single isothermal model, shows residual soft X-ray emission. There is a statistically significant reduction in chi-squared (98 percent probability based on the F-test) when a second temperature component is added. If the asymmetric isophotes seen in the soft X-ray image are indicative of merging subclusters, then our analysis of the Einstein IPC spectra and Solid State Spectrometer observations of A754, which provide some spatial and spectral resolution, suggests that the two temperature components seen in the HEAO 1 A2 spectra are associated with gas trapped in the subcluster potential wells. The implied subcluster isothermal masses suggest that a more massive cluster is accreting a less massive companion in A754. The present observations cannot rule out the alternative possibility that the cooler gas is associated with the outer cluster atmosphere rather than individual subclusters, as appears to be the case for A119. Astro D observations will be necessary to distinguish between these two possibilities for both clusters.

  13. Evaluating tests of virialization and substructure using galaxy clusters in the ORELSE survey

    NASA Astrophysics Data System (ADS)

    Rumbaugh, N.; Lemaux, B. C.; Tomczak, A. R.; Shen, L.; Pelliccia, D.; Lubin, L. M.; Kocevski, D. D.; Wu, P.-F.; Gal, R. R.; Mei, S.; Fassnacht, C. D.; Squires, G. K.

    2018-07-01

    We evaluated the effectiveness of different indicators of cluster virialization using 12 large-scale structures in the Observations of Redshift Evolution in Large-Scale Environments survey spanning from 0.7

  14. Evaluating Tests of Virialization and Substructure Using Galaxy Clusters in the ORELSE Survey

    NASA Astrophysics Data System (ADS)

    Rumbaugh, N.; Lemaux, B. C.; Tomczak, A. R.; Shen, L.; Pelliccia, D.; Lubin, L. M.; Kocevski, D. D.; Wu, P.-F.; Gal, R. R.; Mei, S.; Fassnacht, C. D.; Squires, G. K.

    2018-05-01

    We evaluated the effectiveness of different indicators of cluster virialization using 12 large-scale structures in the ORELSE survey spanning from 0.7 < z < 1.3. We located diffuse X-ray emission from 16 galaxy clusters using Chandra observations. We studied the properties of these clusters and their members, using Chandra data in conjunction with optical and near-IR imaging and spectroscopy. We measured X-ray luminosities and gas temperatures of each cluster, as well as velocity dispersions of their member galaxies. We compared these results to scaling relations derived from virialized clusters, finding significant offsets of up to 3-4σ for some clusters, which could indicate they are disturbed or still forming. We explored if other properties of the clusters correlated with these offsets by performing a set of tests of virialization and substructure on our sample, including Dressler-Schectman tests, power ratios, analyses of the velocity distributions of galaxy populations, and centroiding differences. For comparison to a wide range of studies, we used two sets of tests: ones that did and did not use spectral energy distribution fitting to obtain rest-frame colours, stellar masses, and photometric redshifts of galaxies. Our results indicated that the difference between the stellar mass or light mean-weighted center and the X-ray center, as well as the projected offset of the most-massive/brightest cluster galaxy from other cluster centroids had the strongest correlations with scaling relation offsets, implying they are the most robust indicators of cluster virialization and can be used for this purpose when X-ray data is insufficiently deep for reliable LX and TX measurements.

  15. Low Back Pain Subgroups using Fear-Avoidance Model Measures: Results of a Cluster Analysis

    PubMed Central

    Beneciuk, Jason M.; Robinson, Michael E.; George, Steven Z.

    2012-01-01

    Objectives The purpose of this secondary analysis was to test the hypothesis that an empirically derived psychological subgrouping scheme based on multiple Fear-Avoidance Model (FAM) constructs would provide additional capabilities for clinical outcomes in comparison to a single FAM construct. Methods Patients (n = 108) with acute or sub-acute low back pain (LBP) enrolled in a clinical trial comparing behavioral physical therapy interventions to classification based physical therapy completed baseline questionnaires for pain catastrophizing (PCS), fear-avoidance beliefs (FABQ-PA, FABQ-W), and patient-specific fear (FDAQ). Clinical outcomes were pain intensity and disability measured at baseline, 4-weeks, and 6-months. A hierarchical agglomerative cluster analysis was used to create distinct cluster profiles among FAM measures and discriminant analysis was used to interpret clusters. Changes in clinical outcomes were investigated with repeated measures ANOVA and differences in results based on cluster membership were compared to FABQ-PA subgrouping used in the original trial. Results Three distinct FAM subgroups (Low Risk, High Specific Fear, and High Fear & Catastrophizing) emerged from cluster analysis. Subgroups differed on baseline pain and disability (p’s<.01) with the High Fear & Catastrophizing subgroup associated with greater pain than the Low Risk subgroup (p<.01) and the greatest disability (p’s<.05). Subgroup × time interactions were detected for both pain and disability (p’s<.05) with the High Fear & Catastrophizing subgroup reporting greater changes in pain and disability than other subgroups (p’s<.05). In contrast, FABQ-PA subgroups used in the original trial were not associated with interactions for clinical outcomes. Discussion These data suggest that subgrouping based on multiple FAM measures may provide additional information on clinical outcomes in comparison to determining subgroup status by FABQ-PA alone. Subgrouping methods for patients with LBP should include multiple psychological factors to further explore if patients can be matched with appropriate interventions. PMID:22510537

  16. Information extraction from dynamic PS-InSAR time series using machine learning

    NASA Astrophysics Data System (ADS)

    van de Kerkhof, B.; Pankratius, V.; Chang, L.; van Swol, R.; Hanssen, R. F.

    2017-12-01

    Due to the increasing number of SAR satellites, with shorter repeat intervals and higher resolutions, SAR data volumes are exploding. Time series analyses of SAR data, i.e. Persistent Scatterer (PS) InSAR, enable the deformation monitoring of the built environment at an unprecedented scale, with hundreds of scatterers per km2, updated weekly. Potential hazards, e.g. due to failure of aging infrastructure, can be detected at an early stage. Yet, this requires the operational data processing of billions of measurement points, over hundreds of epochs, updating this data set dynamically as new data come in, and testing whether points (start to) behave in an anomalous way. Moreover, the quality of PS-InSAR measurements is ambiguous and heterogeneous, which will yield false positives and false negatives. Such analyses are numerically challenging. Here we extract relevant information from PS-InSAR time series using machine learning algorithms. We cluster (group together) time series with similar behaviour, even though they may not be spatially close, such that the results can be used for further analysis. First we reduce the dimensionality of the dataset in order to be able to cluster the data, since applying clustering techniques on high dimensional datasets often result in unsatisfying results. Our approach is to apply t-distributed Stochastic Neighbor Embedding (t-SNE), a machine learning algorithm for dimensionality reduction of high-dimensional data to a 2D or 3D map, and cluster this result using Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results show that we are able to detect and cluster time series with similar behaviour, which is the starting point for more extensive analysis into the underlying driving mechanisms. The results of the methods are compared to conventional hypothesis testing as well as a Self-Organising Map (SOM) approach. Hypothesis testing is robust and takes the stochastic nature of the observations into account, but is time consuming. Therefore, we successively apply our machine learning approach with the hypothesis testing approach in order to benefit from both the reduced computation time of the machine learning approach as from the robust quality metrics of hypothesis testing. We acknowledge support from NASA AISTNNX15AG84G (PI V. Pankratius)

  17. The Evolution of Globular Cluster Systems In Early-Type Galaxies

    NASA Astrophysics Data System (ADS)

    Grillmair, Carl

    1999-07-01

    We will measure structural parameters {core radii and concentrations} of globular clusters in three early-type galaxies using deep, four-point dithered observations. We have chosen globular cluster systems which have young, medium-age and old cluster populations, as indicated by cluster colors and luminosities. Our primary goal is to test the hypothesis that globular cluster luminosity functions evolve towards a ``universal'' form. Previous observations have shown that young cluster systems have exponential luminosity functions rather than the characteristic log-normal luminosity function of old cluster systems. We will test to see whether such young system exhibits a wider range of structural parameters than an old systems, and whether and at what rate plausible disruption mechanisms will cause the luminosity function to evolve towards a log-normal form. A simple observational comparison of structural parameters between different age cluster populations and between diff er ent sub-populations within the same galaxy will also provide clues concerning both the formation and destruction mechanisms of star clusters, the distinction between open and globular clusters, and the advisability of using globular cluster luminosity functions as distance indicators.

  18. A study to evaluate the acceptability, feasibility and impact of packaged interventions ("Diarrhea Pack") for prevention and treatment of childhood diarrhea in rural Pakistan.

    PubMed

    Habib, Muhammad Atif; Soofi, Sajid; Sadiq, Kamran; Samejo, Tariq; Hussain, Musawar; Mirani, Mushtaq; Rehmatullah, Asmatullah; Ahmed, Imran; Bhutta, Zulfiqar A

    2013-10-03

    Diarrhea remains one of the leading public health issues in developing countries and is a major contributor in morbidity and mortality in children under five years of age. Interventions such as ORS, Zinc, water purification and improved hygiene and sanitation can significantly reduce the diarrhea burden but their coverage remains low and has not been tested as packaged intervention before. This study attempts to evaluate the package of evidence based interventions in a "Diarrhea Pack" through first level health care providers at domiciliary level in community based settings. This study sought to evaluate the acceptability, feasibility and impact of diarrhea Pack on diarrhea burden. A cluster randomized design was used to evaluate the objectives of the project a union council was considered as a cluster for analysis, a total of eight clusters, four in intervention and four in control were included in the study. We conducted a baseline survey in all clusters followed by the delivery of diarrhea Pack in intervention clusters through community health workers at domiciliary level and through sales promoters to health care providers and pharmacies. Four quarterly surveillance rounds were conducted to evaluate the impact of diarrhea pack in all clusters by an independent team of Field workers. Both the intervention and control clusters were similar at the baseline but as the study progress we found a significant increase in uptake of ORS and Zinc along with the reduction in antibiotic use, diarrhea burden and hospitalization in intervention clusters when compared with the control clusters. We found that the Diarrhea Pack was well accepted with all of its components in the community. The intervention was well accepted and had a productive impact on the uptake of ORS and zinc and reduction in the use of antibiotics. It is feasible to deliver interventions such as diarrhea pack through community health workers in community settings. The intervention has the potential to be scaled up at national level.

  19. Monitoring evolving urban cluster systems using DMSP/OLS nighttime light data: a case study of the Yangtze River Delta region, China

    NASA Astrophysics Data System (ADS)

    Wang, Zhao; Yang, Shan; Wang, Shuguang; Shen, Yan

    2017-10-01

    The assessment of the dynamic urban structure has been affected by lack of timely and accurate spatial information for a long period, which has hindered the measurements of structural continuity at the macroscale. Defense meteorological satellite program's operational linescan system (DMSP/OLS) nighttime light (NTL) data provide an ideal source for urban information detection with a long-time span, short-time interval, and wide coverage. In this study, we extracted the physical boundaries of urban clusters from corrected NTL images and quantitatively analyzed the structure of the urban cluster system based on rank-size distribution, spatial metrics, and Mann-Kendall trend test. Two levels of urban cluster systems in the Yangtze River Delta region (YRDR) were examined. We found that (1) in the entire YRDR, the urban cluster system showed a periodic process, with a significant trend of even distribution before 2007 but an unequal growth pattern after 2007, and (2) at the metropolitan level, vast disparities exist in four metropolitan areas for the fluctuations of Pareto's exponent, the speed of cluster expansion, and the dominance of core cluster. The results suggest that the extracted urban cluster information from NTL data effectively reflect the evolving nature of regional urbanization, which in turn can aid in the planning of cities and help achieve more sustainable regional development.

  20. Galaxy cluster lensing masses in modified lensing potentials

    DOE PAGES

    Barreira, Alexandre; Li, Baojiu; Jennings, Elise; ...

    2015-10-28

    In this study, we determine the concentration–mass relation of 19 X-ray selected galaxy clusters from the Cluster Lensing and Supernova Survey with Hubble survey in theories of gravity that directly modify the lensing potential. We model the clusters as Navarro–Frenk–White haloes and fit their lensing signal, in the Cubic Galileon and Nonlocal gravity models, to the lensing convergence profiles of the clusters. We discuss a number of important issues that need to be taken into account, associated with the use of non-parametric and parametric lensing methods, as well as assumptions about the background cosmology. Our results show that the concentrationmore » and mass estimates in the modified gravity models are, within the error bars, the same as in Λ cold dark matter. This result demonstrates that, for the Nonlocal model, the modifications to gravity are too weak at the cluster redshifts, and for the Galileon model, the screening mechanism is very efficient inside the cluster radius. However, at distances ~ [2–20] Mpc/h from the cluster centre, we find that the surrounding force profiles are enhanced by ~ 20–40% in the Cubic Galileon model. This has an impact on dynamical mass estimates, which means that tests of gravity based on comparisons between lensing and dynamical masses can also be applied to the Cubic Galileon model.« less

  1. Testing the accuracy of clustering redshifts with simulations

    NASA Astrophysics Data System (ADS)

    Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

    2018-03-01

    We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

  2. Cluster analysis of cognitive performance in elderly and demented subjects.

    PubMed

    Giaquinto, S; Nolfe, G; Calvani, M

    1985-06-01

    48 elderly normals, 14 demented subjects and 76 young controls were tested for basic cognitive functions. All the tests were quantified and could therefore be subjected to statistical analysis. The results show a difference in the speed of information processing and in memory load between the young controls and elderly normals but the age groups differed in quantitative terms only. Cluster analysis showed that the elderly and the demented formed two distinctly separate groups at the qualitative level, the basic cognitive processes being damaged in the demented group. Age thus appears to be only a risk factor for dementia and not its cause. It is concluded that batteries based on precise and measurable tasks are the most appropriate not only for the study of dementia but for rehabilitation purposes too.

  3. Parallelization of MRCI based on hole-particle symmetry.

    PubMed

    Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin

    2005-01-15

    The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.

  4. Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

    PubMed

    Saeed, Faisal; Salim, Naomie; Abdo, Ammar

    2013-07-01

    Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. 3. Credit USAF, ca. 1945. Original housed in the Records ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    3. Credit USAF, ca. 1945. Original housed in the Records of the Defense Intelligence Agency. Record Group 373. National Archives. Cartographic and Architectural Branch. Washington, D.C. Aerial orthophoto map 16PS5M79-IV23 of Muroc Flight Test Base (North Base), north faces up with runway at the top and Rogers Dry Lake at the lower right. Ammunition huts (not extant in 1995) appear in a cluster just south of the west end of the runway. Note runway markings on lakebed. Linear feature at very top of image is rocket sled test track designed and built 1944-1945. - Edwards Air Force Base, North Base, North Base Road, Boron, Kern County, CA

  6. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    PubMed Central

    Ying Wah, Teh

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

  7. A fast density-based clustering algorithm for real-time Internet of Things stream.

    PubMed

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  8. Clustering on Magnesium Surfaces – Formation and Diffusion Energies

    DOE PAGES

    Chu, Haijian; Huang, Hanchen; Wang, Jian

    2017-07-12

    The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and {more » $$\\bar{1}$$011} . In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a {$$\\bar{1}$$011} , clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface {$$\\bar{1}$$011} is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich–Schwoebel barriers converge as the step height is three atomic layers or thicker. FInally, adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.« less

  9. Clustering on Magnesium Surfaces – Formation and Diffusion Energies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chu, Haijian; Huang, Hanchen; Wang, Jian

    The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and {more » $$\\bar{1}$$011} . In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a {$$\\bar{1}$$011} , clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface {$$\\bar{1}$$011} is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich–Schwoebel barriers converge as the step height is three atomic layers or thicker. FInally, adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.« less

  10. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

    PubMed

    Gorzalczany, Marian B; Rudzinski, Filip

    2017-06-07

    This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.

  11. A clustering algorithm for sample data based on environmental pollution characteristics

    NASA Astrophysics Data System (ADS)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  12. Tests of the gravitational redshift effect in space-born and ground-based experiments

    NASA Astrophysics Data System (ADS)

    Vavilova, I. B.

    2018-02-01

    This paper provides a brief overview of experiments as concerns with the tests of the gravitational redshift (GRS) effect in ground-based and space-born experiments. In particular, we consider the GRS effects in the gravitational field of the Earth, the major planets of the Solar system, compact stars (white dwarfs and neutron stars) where this effect is confirmed with a higher accuracy. We discuss availabilities to confirm the GRS effect for galaxies and galaxy clusters in visible and X-ray ranges of the electromagnetic spectrum.

  13. Diffusion and mobility of atomic particles in a liquid

    NASA Astrophysics Data System (ADS)

    Smirnov, B. M.; Son, E. E.; Tereshonok, D. V.

    2017-11-01

    The diffusion coefficient of a test atom or molecule in a liquid is determined for the mechanism where the displacement of the test molecule results from the vibrations and motion of liquid molecules surrounding the test molecule and of the test particle itself. This leads to a random change in the coordinate of the test molecule, which eventually results in the diffusion motion of the test particle in space. Two models parameters of interaction of a particle and a liquid are used to find the activation energy of the diffusion process under consideration: the gas-kinetic cross section for scattering of test molecules in the parent gas and the Wigner-Seitz radius for test molecules. In the context of this approach, we have calculated the diffusion coefficient of atoms and molecules in water, where based on experimental data, we have constructed the dependence of the activation energy for the diffusion of test molecules in water on the interaction parameter and the temperature dependence for diffusion coefficient of atoms or molecules in water within the models considered. The statistically averaged difference of the activation energies for the diffusion coefficients of different test molecules in water that we have calculated based on each of the presented models does not exceed 10% of the diffusion coefficient itself. We have considered the diffusion of clusters in water and present the dependence of the diffusion coefficient on the cluster size. The accuracy of the presented formulas for the diffusion coefficient of atomic particles in water is estimated to be 50%.

  14. Sensitivity and specificity of subacute computerized neurocognitive testing and symptom evaluation in predicting outcomes after sports-related concussion.

    PubMed

    Lau, Brian C; Collins, Michael W; Lovell, Mark R

    2011-06-01

    Concussions affect an estimated 136 000 high school athletes yearly. Computerized neurocognitive testing has been shown to be appropriately sensitive and specific in diagnosing concussions, but no studies have assessed its utility to predict length of recovery. Determining prognosis during subacute recovery after sports concussion will help clinicians more confidently address return-to-play and academic decisions. To quantify the prognostic ability of computerized neurocognitive testing in combination with symptoms during the subacute recovery phase from sports-related concussion. Cohort study (prognosis); Level of evidence, 2. In sum, 108 male high school football athletes completed a computer-based neurocognitive test battery within 2.23 days of injury and were followed until returned to play as set by international guidelines. Athletes were grouped into protracted recovery (>14 days; n = 50) or short-recovery (≤14 days; n = 58). Separate discriminant function analyses were performed using total symptom score on Post-Concussion Symptom Scale, symptom clusters (migraine, cognitive, sleep, neuropsychiatric), and Immediate Postconcussion Assessment and Cognitive Testing neurocognitive scores (verbal memory, visual memory, reaction time, processing speed). Multiple discriminant function analyses revealed that the combination of 4 symptom clusters and 4 neurocognitive composite scores had the highest sensitivity (65.22%), specificity (80.36%), positive predictive value (73.17%), and negative predictive value (73.80%) in predicting protracted recovery. Discriminant function analyses of total symptoms on the Post-Concussion Symptom Scale alone had a sensitivity of 40.81%; specificity, 79.31%; positive predictive value, 62.50%; and negative predictive value, 61.33%. The 4 symptom clusters alone discriminant function analyses had a sensitivity of 46.94%; specificity, 77.20%; positive predictive value, 63.90%; and negative predictive value, 62.86%. Discriminant function analyses of the 4 computerized neurocognitive scores alone had a sensitivity of 53.20%; specificity, 75.44%; positive predictive value, 64.10%; and negative predictive value, 66.15%. The use of computerized neurocognitive testing in conjunction with symptom clusters results improves sensitivity, specificity, positive predictive value, and negative predictive value of predicting protracted recovery compared with each used alone. There is also a net increase in sensitivity of 24.41% when using neurocognitive testing and symptom clusters together compared with using total symptoms on Post-Concussion Symptom Scale alone.

  15. A Weight-Adaptive Laplacian Embedding for Graph-Based Clustering.

    PubMed

    Cheng, De; Nie, Feiping; Sun, Jiande; Gong, Yihong

    2017-07-01

    Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.

  16. Cardiorespiratory instability in monitored step-down unit patients: using cluster analysis to identify patterns of change

    PubMed Central

    Clermont, Gilles; Chen, Lujie; Dubrawski, Artur W.; Ren, Dianxu; Hoffman, Leslie A.; Pinsky, Michael R.; Hravnak, Marilyn

    2018-01-01

    Cardiorespiratory instability (CRI) in monitored step-down unit (SDU) patients has a variety of etiologies, and likely manifests in patterns of vital signs (VS) changes. We explored use of clustering techniques to identify patterns in the initial CRI epoch (CRI1; first exceedances of VS beyond stability thresholds after SDU admission) of unstable patients, and inter-cluster differences in admission characteristics and outcomes. Continuous noninvasive monitoring of heart rate (HR), respiratory rate (RR), and pulse oximetry (SpO2) were sampled at 1/20 Hz. We identified CRI1 in 165 patients, employed hierarchical and k-means clustering, tested several clustering solutions, used 10-fold cross validation to establish the best solution and assessed inter-cluster differences in admission characteristics and outcomes. Three clusters (C) were derived: C1) normal/high HR and RR, normal SpO2 (n = 30); C2) normal HR and RR, low SpO2 (n = 103); and C3) low/normal HR, low RR and normal SpO2 (n = 32). Clusters were significantly different based on age (p < 0.001; older patients in C2), number of comorbidities (p = 0.008; more C2 patients had ≥ 2) and hospital length of stay (p = 0.006; C1 patients stayed longer). There were no between-cluster differences in SDU length of stay, or mortality. Three different clusters of VS presentations for CRI1 were identified. Clusters varied on age, number of comorbidities and hospital length of stay. Future study is needed to determine if there are common physiologic underpinnings of VS clusters which might inform clinical decision-making when CRI first manifests. PMID:28229353

  17. Predicting stabilizing treatment outcomes for complex posttraumatic stress disorder and dissociative identity disorder: an expertise-based prognostic model.

    PubMed

    Baars, Erik W; van der Hart, Onno; Nijenhuis, Ellert R S; Chu, James A; Glas, Gerrit; Draijer, Nel

    2011-01-01

    The purpose of this study was to develop an expertise-based prognostic model for the treatment of complex posttraumatic stress disorder (PTSD) and dissociative identity disorder (DID). We developed a survey in 2 rounds: In the first round we surveyed 42 experienced therapists (22 DID and 20 complex PTSD therapists), and in the second round we surveyed a subset of 22 of the 42 therapists (13 DID and 9 complex PTSD therapists). First, we drew on therapists' knowledge of prognostic factors for stabilization-oriented treatment of complex PTSD and DID. Second, therapists prioritized a list of prognostic factors by estimating the size of each variable's prognostic effect; we clustered these factors according to content and named the clusters. Next, concept mapping methodology and statistical analyses (including principal components analyses) were used to transform individual judgments into weighted group judgments for clusters of items. A prognostic model, based on consensually determined estimates of effect sizes, of 8 clusters containing 51 factors for both complex PTSD and DID was formed. It includes the clusters lack of motivation, lack of healthy relationships, lack of healthy therapeutic relationships, lack of other internal and external resources, serious Axis I comorbidity, serious Axis II comorbidity, poor attachment, and self-destruction. In addition, a set of 5 DID-specific items was constructed. The model is supportive of the current phase-oriented treatment model, emphasizing the strengthening of the therapeutic relationship and the patient's resources in the initial stabilization phase. Further research is needed to test the model's statistical and clinical validity.

  18. Segmentation of dermatoscopic images by frequency domain filtering and k-means clustering algorithms.

    PubMed

    Rajab, Maher I

    2011-11-01

    Since the introduction of epiluminescence microscopy (ELM), image analysis tools have been extended to the field of dermatology, in an attempt to algorithmically reproduce clinical evaluation. Accurate image segmentation of skin lesions is one of the key steps for useful, early and non-invasive diagnosis of coetaneous melanomas. This paper proposes two image segmentation algorithms based on frequency domain processing and k-means clustering/fuzzy k-means clustering. The two methods are capable of segmenting and extracting the true border that reveals the global structure irregularity (indentations and protrusions), which may suggest excessive cell growth or regression of a melanoma. As a pre-processing step, Fourier low-pass filtering is applied to reduce the surrounding noise in a skin lesion image. A quantitative comparison of the techniques is enabled by the use of synthetic skin lesion images that model lesions covered with hair to which Gaussian noise is added. The proposed techniques are also compared with an established optimal-based thresholding skin-segmentation method. It is demonstrated that for lesions with a range of different border irregularity properties, the k-means clustering and fuzzy k-means clustering segmentation methods provide the best performance over a range of signal to noise ratios. The proposed segmentation techniques are also demonstrated to have similar performance when tested on real skin lesions representing high-resolution ELM images. This study suggests that the segmentation results obtained using a combination of low-pass frequency filtering and k-means or fuzzy k-means clustering are superior to the result that would be obtained by using k-means or fuzzy k-means clustering segmentation methods alone. © 2011 John Wiley & Sons A/S.

  19. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

    PubMed

    Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

    2015-12-01

    One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .

  20. A mixture model-based approach to the clustering of microarray expression data.

    PubMed

    McLachlan, G J; Bean, R W; Peel, D

    2002-03-01

    This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets. EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

  1. A fully automatic microcalcification detection approach based on deep convolution neural network

    NASA Astrophysics Data System (ADS)

    Cai, Guanxiong; Guo, Yanhui; Zhang, Yaqin; Qin, Genggeng; Zhou, Yuanpin; Lu, Yao

    2018-02-01

    Breast cancer is one of the most common cancers and has high morbidity and mortality worldwide, posing a serious threat to the health of human beings. The emergence of microcalcifications (MCs) is an important signal of early breast cancer. However, it is still challenging and time consuming for radiologists to identify some tiny and subtle individual MCs in mammograms. This study proposed a novel computer-aided MC detection algorithm on the full field digital mammograms (FFDMs) using deep convolution neural network (DCNN). Firstly, a MC candidate detection system was used to obtain potential MC candidates. Then a DCNN was trained using a novel adaptive learning strategy, neutrosophic reinforcement sample learning (NRSL) strategy to speed up the learning process. The trained DCNN served to recognize true MCs. After been classified by DCNN, a density-based regional clustering method was imposed to form MC clusters. The accuracy of the DCNN with our proposed NRSL strategy converges faster and goes higher than the traditional DCNN at same epochs, and the obtained an accuracy of 99.87% on training set, 95.12% on validation set, and 93.68% on testing set at epoch 40. For cluster-based MC cluster detection evaluation, a sensitivity of 90% was achieved at 0.13 false positives (FPs) per image. The obtained results demonstrate that the designed DCNN plays a significant role in the MC detection after being prior trained.

  2. A Human Activity Recognition System Based on Dynamic Clustering of Skeleton Data.

    PubMed

    Manzi, Alessandro; Dario, Paolo; Cavallo, Filippo

    2017-05-11

    Human activity recognition is an important area in computer vision, with its wide range of applications including ambient assisted living. In this paper, an activity recognition system based on skeleton data extracted from a depth camera is presented. The system makes use of machine learning techniques to classify the actions that are described with a set of a few basic postures. The training phase creates several models related to the number of clustered postures by means of a multiclass Support Vector Machine (SVM), trained with Sequential Minimal Optimization (SMO). The classification phase adopts the X-means algorithm to find the optimal number of clusters dynamically. The contribution of the paper is twofold. The first aim is to perform activity recognition employing features based on a small number of informative postures, extracted independently from each activity instance; secondly, it aims to assess the minimum number of frames needed for an adequate classification. The system is evaluated on two publicly available datasets, the Cornell Activity Dataset (CAD-60) and the Telecommunication Systems Team (TST) Fall detection dataset. The number of clusters needed to model each instance ranges from two to four elements. The proposed approach reaches excellent performances using only about 4 s of input data (~100 frames) and outperforms the state of the art when it uses approximately 500 frames on the CAD-60 dataset. The results are promising for the test in real context.

  3. Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

    PubMed Central

    Li, Peng; Redden, David T.

    2014-01-01

    SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738

  4. Model-Based Clustering and Data Transformations for Gene Expression Data

    DTIC Science & Technology

    2001-04-30

    transformation parameters, e.g. Andrews, Gnanadesikan , and Warner (1973). Aitchison tests: Aitchison (1986) tested three aspects of the data for...N in the Box-Cox transformation in Equation (5) is estimated by maximum likelihood using the observa- tions (Andrews, Gnanadesikan , and Warner 1973...Compositional Data. Chapman and Hall. Andrews, D. F., R. Gnanadesikan , and J. L. Warner (1973). Methods for assessing multivari- ate normality. In P. R

  5. Evaluation of Second-Level Inference in fMRI Analysis

    PubMed Central

    Roels, Sanne P.; Loeys, Tom; Moerkerke, Beatrijs

    2016-01-01

    We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference. PMID:26819578

  6. Implementation of authentic assessment in the project based learning to improve student's concept mastering

    NASA Astrophysics Data System (ADS)

    Sambeka, Yana; Nahadi, Sriyati, Siti

    2017-05-01

    The study aimed to obtain the scientific information about increase of student's concept mastering in project based learning that used authentic assessment. The research was conducted in May 2016 at one of junior high school in Bandung in the academic year of 2015/2016. The research method was weak experiment with the one-group pretest-posttest design. The sample was taken by random cluster sampling technique and the sample was 24 students. Data collected through instruments, i.e. written test, observation sheet, and questionnaire sheet. Student's concept mastering test obtained N-Gain of 0.236 with the low category. Based on the result of paired sample t-test showed that implementation of authentic assessment in the project based learning increased student's concept mastering significantly, (sig<0.05).

  7. Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

    PubMed

    Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

    2018-05-04

    In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.

  8. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  9. Sleep, Dietary, and Exercise Behavioral Clusters among Truck Drivers with Obesity: Implications for Interventions

    PubMed Central

    Olson, Ryan; Thompson, Sharon V.; Wipfli, Brad; Hanson, Ginger; Elliot, Diane L.; Anger, W. Kent; Bodner, Todd; Hammer, Leslie B.; Hohn, Elliot; Perrin, Nancy A.

    2015-01-01

    Objective Our objectives were to describe a sample of truck drivers, identify clusters of drivers with similar patterns in behaviors affecting energy balance (sleep, diet, and exercise), and test for cluster differences in health and psychosocial factors. Methods Participants’ (n=452, BMI M=37.2, 86.4% male) self-reported behaviors were dichotomized prior to hierarchical cluster analysis, which identified groups with similar behavior co-variation. Cluster differences were tested with generalized estimating equations. Results Five behavioral clusters were identified that differed significantly in age, smoking status, diabetes prevalence, lost work days, stress, and social support, but not in BMI. Cluster 2, characterized by the best sleep quality, had significantly lower lost workdays and stress than other clusters. Conclusions Weight management interventions for drivers should explicitly address sleep, and may be maximally effective after establishing socially supportive work environments that reduce stress exposures. PMID:26949883

  10. Sleep, Dietary, and Exercise Behavioral Clusters Among Truck Drivers With Obesity: Implications for Interventions.

    PubMed

    Olson, Ryan; Thompson, Sharon V; Wipfli, Brad; Hanson, Ginger; Elliot, Diane L; Anger, W Kent; Bodner, Todd; Hammer, Leslie B; Hohn, Elliot; Perrin, Nancy A

    2016-03-01

    The objectives of the study were to describe a sample of truck drivers, identify clusters of drivers with similar patterns in behaviors affecting energy balance (sleep, diet, and exercise), and test for cluster differences in health safety, and psychosocial factors. Participants' (n = 452, body mass index M = 37.2, 86.4% male) self-reported behaviors were dichotomized prior to hierarchical cluster analysis, which identified groups with similar behavior covariation. Cluster differences were tested with generalized estimating equations. Five behavioral clusters were identified that differed significantly in age, smoking status, diabetes prevalence, lost work days, stress, and social support, but not in body mass index. Cluster 2, characterized by the best sleep quality, had significantly lower lost workdays and stress than other clusters. Weight management interventions for drivers should explicitly address sleep, and may be maximally effective after establishing socially supportive work environments that reduce stress exposures.

  11. CANTAB object recognition and language tests to detect aging cognitive decline: an exploratory comparative study

    PubMed Central

    Cabral Soares, Fernanda; de Oliveira, Thaís Cristina Galdino; de Macedo, Liliane Dias e Dias; Tomás, Alessandra Mendonça; Picanço-Diniz, Domingos Luiz Wanderley; Bento-Torres, João; Bento-Torres, Natáli Valim Oliver; Picanço-Diniz, Cristovam Wanderley

    2015-01-01

    Objective The recognition of the limits between normal and pathological aging is essential to start preventive actions. The aim of this paper is to compare the Cambridge Neuropsychological Test Automated Battery (CANTAB) and language tests to distinguish subtle differences in cognitive performances in two different age groups, namely young adults and elderly cognitively normal subjects. Method We selected 29 young adults (29.9±1.06 years) and 31 older adults (74.1±1.15 years) matched by educational level (years of schooling). All subjects underwent a general assessment and a battery of neuropsychological tests, including the Mini Mental State Examination, visuospatial learning, and memory tasks from CANTAB and language tests. Cluster and discriminant analysis were applied to all neuropsychological test results to distinguish possible subgroups inside each age group. Results Significant differences in the performance of aged and young adults were detected in both language and visuospatial memory tests. Intragroup cluster and discriminant analysis revealed that CANTAB, as compared to language tests, was able to detect subtle but significant differences between the subjects. Conclusion Based on these findings, we concluded that, as compared to language tests, large-scale application of automated visuospatial tests to assess learning and memory might increase our ability to discern the limits between normal and pathological aging. PMID:25565785

  12. Goal Profiles, Mental Toughness and its Influence on Performance Outcomes among Wushu Athletes

    PubMed Central

    Roy, Jolly

    2007-01-01

    This study examined the association between goal orientations and mental toughness and its influence on performance outcomes in competition. Wushu athletes (n = 40) competing in Intervarsity championships in Malaysia completed Task and Ego Orientations in Sport Questionnaire (TEOSQ) and Psychological Performance Inventory (PPI). Using cluster analysis techniques including hierarchical methods and the non-hierarchical method (k-means cluster) to examine goal profiles, a three cluster solution emerged viz. cluster 1 - high task and moderate ego (HT/ME), cluster 2 - moderate task and low ego (MT/LE) and, cluster 3 - moderate task and moderate ego (MT/ME). Analysis of the fundamental areas of mental toughness based on goal profiles revealed that athletes in cluster 1 scored significantly higher on negative energy control than athletes in cluster 2. Further, athletes in cluster 1 also scored significantly higher on positive energy control than athletes in cluster 3. Chi-square (χ2) test revealed no significant differences among athletes with different goal profiles on performance outcomes in the competition. However, significant differences were observed between athletes (medallist and non medallist) in self- confidence (p = 0.001) and negative energy control (p = 0.042). Medallist’s scored significantly higher on self-confidence (mean = 21.82 ± 2.72) and negative energy control (mean = 19.59 ± 2.32) than the non-medallists (self confidence-mean = 18.76 ± 2.49; negative energy control mean = 18.14 ± 1.91). Key points Mental toughness can be influenced by certain goal profile combination. Athletes with successful outcomes in performance (medallist) displayed greater mental toughness. PMID:24198700

  13. Cognitive-affective depression and somatic symptoms clusters are differentially associated with maternal parenting and coparenting.

    PubMed

    Lamela, Diogo; Jongenelen, Inês; Morais, Ana; Figueiredo, Bárbara

    2017-09-01

    Both depressive and somatic symptoms are significant predictors of parenting and coparenting problems. However, despite clear evidence of their co-occurrence, no study to date has examined the association between depressive-somatic symptoms clusters and parenting and coparenting. The current research sought to identify and cross-validate clusters of cognitive-affective depressive symptoms and nonspecific somatic symptoms, as well as to test whether clusters would differ on parenting and coparenting problems across three independent samples of mothers. Participants in Studies 1 and 3 consisted of 409 and 652 community mothers, respectively. Participants in Study 2 consisted of 162 mothers exposed to intimate partner violence. All participants prospectively completed self-report measures of depressive and nonspecific somatic symptoms and parenting (Studies 1 and 2) or coparenting (Study 3). Across studies, three depression-somatic symptoms clusters were identified: no symptoms, high depression and low nonspecific somatic symptoms, and high depression and nonspecific somatic symptoms. The high depression-somatic symptoms cluster was associated with the highest levels of child physical maltreatment risk (Study 1) and overt-conflict coparenting (Study 3). No differences in perceived maternal competence (Study 2) and cooperative and undermining coparenting (Study 3) were found between the high depression and low somatic symptoms cluster and the high depression-somatic symptoms cluster. The results provide novel evidence for the strong associations between clusters of depression and nonspecific somatic symptoms and specific parenting and coparenting problems. Cluster stability across three independent samples suggest that they may be generalizable. The results inform preventive approaches and evidence-based psychotherapeutic treatments. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. A statistically compiled test battery for feasible evaluation of knee function after rupture of the Anterior Cruciate Ligament – derived from long-term follow-up data

    PubMed Central

    2017-01-01

    Purpose Clinical test batteries for evaluation of knee function after injury to the Anterior Cruciate Ligament (ACL) should be valid and feasible, while reliably capturing the outcome of rehabilitation. There is currently a lack of consensus as to which of the many available assessment tools for knee function that should be included. The present aim was to use a statistical approach to investigate the contribution of frequently used tests to avoid redundancy, and filter them down to a proposed comprehensive and yet feasible test battery for long-term evaluation after ACL injury. Methods In total 48 outcome variables related to knee function, all potentially relevant for a long-term follow-up, were included from a cross-sectional study where 70 ACL-injured (17–28 years post injury) individuals were compared to 33 controls. Cluster analysis and logistic regression were used to group variables and identify an optimal test battery, from which a summarized estimator of knee function representing various functional aspects was derived. Results As expected, several variables were strongly correlated, and the variables also fell into logical clusters with higher within-correlation (max ρ = 0.61) than between clusters (max ρ = 0.19). An extracted test battery with just four variables assessing one-leg balance, isokinetic knee extension strength and hop performance (one-leg hop, side hop) were mathematically combined to an estimator of knee function, which acceptably classified ACL-injured individuals and controls. This estimator, derived from objective measures, correlated significantly with self-reported function, e.g. Lysholm score (ρ = 0.66; p<0.001). Conclusions The proposed test battery, based on a solid statistical approach, includes assessments which are all clinically feasible, while also covering complementary aspects of knee function. Similar test batteries could be determined for earlier phases of ACL rehabilitation or to enable longitudinal monitoring. Such developments, established on a well-grounded consensus of measurements, would facilitate comparisons of studies and enable evidence-based rehabilitation. PMID:28459885

  15. Testing spectral models for stellar populations with star clusters - II. Results

    NASA Astrophysics Data System (ADS)

    González Delgado, Rosa M.; Cid Fernandes, Roberto

    2010-04-01

    High spectral resolution evolutionary synthesis models have become a routinely used ingredient in extragalactic work, and as such deserve thorough testing. Star clusters are ideal laboratories for such tests. This paper applies the spectral fitting methodology outlined in Paper I to a sample of clusters, mainly from the Magellanic Clouds and spanning a wide range in age and metallicity, fitting their integrated light spectra with a suite of modern evolutionary synthesis models for single stellar populations. The combinations of model plus spectral library employed in this investigation are Galaxev/STELIB, Vazdekis/MILES, SED@/GRANADA and Galaxev/MILES+GRANADA, which provide a representative sample of models currently available for spectral fitting work. A series of empirical tests are performed with these models, comparing the quality of the spectral fits and the values of age, metallicity and extinction obtained with each of them. A comparison is also made between the properties derived from these spectral fits and literature data on these nearby, well studied clusters. These comparisons are done with the general goal of providing useful feedback for model makers, as well as guidance to the users of such models. We find the following. (i) All models are able to derive ages that are in good agreement both with each other and with literature data, although ages derived from spectral fits are on average slightly older than those based on the S-colour-magnitude diagram (S-CMD) method as calibrated by Girardi et al. (ii) There is less agreement between the models for the metallicity and extinction. In particular, Galaxev/STELIB models underestimate the metallicity by ~0.6 dex, and the extinction is overestimated by 0.1 mag. (iii) New generations of models using the GRANADA and MILES libraries are superior to STELIB-based models both in terms of spectral fit quality and regarding the accuracy with which age and metallicity are retrieved. Accuracies of about 0.1 dex in age and 0.3 dex in metallicity can be achieved as long as the models are not extrapolated beyond their expected range of validity.

  16. CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

    PubMed

    Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G

    2016-02-01

    Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  17. New asteroseismic scaling relations based on the Hayashi track relation applied to red giant branch stars in NGC 6791 and NGC 6819

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, T.; Li, Y.; Hekker, S., E-mail: wutao@ynao.ac.cn, E-mail: ly@ynao.ac.cn, E-mail: hekker@mps.mpg.de

    2014-01-20

    Stellar mass M, radius R, and gravity g are important basic parameters in stellar physics. Accurate values for these parameters can be obtained from the gravitational interaction between stars in multiple systems or from asteroseismology. Stars in a cluster are thought to be formed coevally from the same interstellar cloud of gas and dust. The cluster members are therefore expected to have some properties in common. These common properties strengthen our ability to constrain stellar models and asteroseismically derived M, R, and g when tested against an ensemble of cluster stars. Here we derive new scaling relations based on amore » relation for stars on the Hayashi track (√(T{sub eff})∼g{sup p}R{sup q}) to determine the masses and metallicities of red giant branch stars in open clusters NGC 6791 and NGC 6819 from the global oscillation parameters Δν (the large frequency separation) and ν{sub max} (frequency of maximum oscillation power). The Δν and ν{sub max} values are derived from Kepler observations. From the analysis of these new relations we derive: (1) direct observational evidence that the masses of red giant branch stars in a cluster are the same within their uncertainties, (2) new methods to derive M and z of the cluster in a self-consistent way from Δν and ν{sub max}, with lower intrinsic uncertainties, and (3) the mass dependence in the Δν - ν{sub max} relation for red giant branch stars.« less

  18. Hydration of Atmospheric Molecular Clusters: Systematic Configurational Sampling.

    PubMed

    Kildgaard, Jens; Mikkelsen, Kurt V; Bilde, Merete; Elm, Jonas

    2018-05-09

    We present a new systematic configurational sampling algorithm for investigating the potential energy surface of hydrated atmospheric molecular clusters. The algo- rithm is based on creating a Fibonacci sphere around each atom in the cluster and adding water molecules to each point in 9 different orientations. To allow the sam- pling of water molecules to existing hydrogen bonds, the cluster is displaced along the hydrogen bond and a water molecule is placed in between in three different ori- entations. Generated redundant structures are eliminated based on minimizing the root mean square distance (RMSD) of different conformers. Initially, the clusters are sampled using the semiempirical PM6 method and subsequently using density func- tional theory (M06-2X and ωB97X-D) with the 6-31++G(d,p) basis set. Applying the developed algorithm we study the hydration of sulfuric acid with up to 15 water molecules. We find that the additions of the first four water molecules "saturate" the sulfuric acid molecule and are more thermodynamically favourable than the addition of water molecule 5-15. Using the large generated set of conformers, we assess the performance of approximate methods (ωB97X-D, M06-2X, PW91 and PW6B95-D3) in calculating the binding energies and assigning the global minimum conformation compared to high level CCSD(T)-F12a/VDZ-F12 reference calculations. The tested DFT functionals systematically overestimates the binding energies compared to cou- pled cluster calculations, and we find that this deficiency can be corrected by a simple scaling factor.

  19. Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

    PubMed Central

    Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

    2015-01-01

    This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309

  20. Modeling of correlated data with informative cluster sizes: An evaluation of joint modeling and within-cluster resampling approaches.

    PubMed

    Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S

    2017-08-01

    Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.

  1. Concern-driven integrated approaches to nanomaterial testing and assessment – report of the NanoSafety Cluster Working Group 10

    PubMed Central

    Oomen, Agnes G.; Bos, Peter M. J.; Fernandes, Teresa F.; Hund-Rinke, Kerstin; Boraschi, Diana; Byrne, Hugh J.; Aschberger, Karin; Gottardo, Stefania; von der Kammer, Frank; Kühnel, Dana; Hristozov, Danail; Marcomini, Antonio; Migliore, Lucia; Scott-Fordsmand, Janeck; Wick, Peter

    2014-01-01

    Bringing together topic-related European Union (EU)-funded projects, the so-called “NanoSafety Cluster” aims at identifying key areas for further research on risk assessment procedures for nanomaterials (NM). The outcome of NanoSafety Cluster Working Group 10, this commentary presents a vision for concern-driven integrated approaches for the (eco-)toxicological testing and assessment (IATA) of NM. Such approaches should start out by determining concerns, i.e., specific information needs for a given NM based on realistic exposure scenarios. Recognised concerns can be addressed in a set of tiers using standardised protocols for NM preparation and testing. Tier 1 includes determining physico-chemical properties, non-testing (e.g., structure–activity relationships) and evaluating existing data. In tier 2, a limited set of in vitro and in vivo tests are performed that can either indicate that the risk of the specific concern is sufficiently known or indicate the need for further testing, including details for such testing. Ecotoxicological testing begins with representative test organisms followed by complex test systems. After each tier, it is evaluated whether the information gained permits assessing the safety of the NM so that further testing can be waived. By effectively exploiting all available information, IATA allow accelerating the risk assessment process and reducing testing costs and animal use (in line with the 3Rs principle implemented in EU Directive 2010/63/EU). Combining material properties, exposure, biokinetics and hazard data, information gained with IATA can be used to recognise groups of NM based upon similar modes of action. Grouping of substances in return should form integral part of the IATA themselves. PMID:23641967

  2. Spatial distribution and cluster analysis of retail drug shop characteristics and antimalarial behaviors as reported by private medicine retailers in western Kenya: informing future interventions.

    PubMed

    Rusk, Andria; Highfield, Linda; Wilkerson, J Michael; Harrell, Melissa; Obala, Andrew; Amick, Benjamin

    2016-02-19

    Efforts to improve malaria case management in sub-Saharan Africa have shifted focus to private antimalarial retailers to increase access to appropriate treatment. Demands to decrease intervention cost while increasing efficacy requires interventions tailored to geographic regions with demonstrated need. Cluster analysis presents an opportunity to meet this demand, but has not been applied to the retail sector or antimalarial retailer behaviors. This research conducted cluster analysis on medicine retailer behaviors in Kenya, to improve malaria case management and inform future interventions. Ninety-seven surveys were collected from medicine retailers working in the Webuye Health and Demographic Surveillance Site. Survey items included retailer training, education, antimalarial drug knowledge, recommending behavior, sales, and shop characteristics, and were analyzed using Kulldorff's spatial scan statistic. The Bernoulli purely spatial model for binomial data was used, comparing cases to controls. Statistical significance of found clusters was tested with a likelihood ratio test, using the null hypothesis of no clustering, and a p value based on 999 Monte Carlo simulations. The null hypothesis was rejected with p values of 0.05 or less. A statistically significant cluster of fewer than expected pharmacy-trained retailers was found (RR = .09, p = .001) when compared to the expected random distribution. Drug recommending behavior also yielded a statistically significant cluster, with fewer than expected retailers recommending the correct antimalarial medication to adults (RR = .018, p = .01), and fewer than expected shops selling that medication more often than outdated antimalarials when compared to random distribution (RR = 0.23, p = .007). All three of these clusters were co-located, overlapping in the northwest of the study area. Spatial clustering was found in the data. A concerning amount of correlation was found in one specific region in the study area where multiple behaviors converged in space, highlighting a prime target for interventions. These results also demonstrate the utility of applying geospatial methods in the study of medicine retailer behaviors, making the case for expanding this approach to other regions.

  3. An initial perspective of S-asteroid subtypes within asteroid families

    NASA Technical Reports Server (NTRS)

    Kelley, M. S.; Gaffey, M. J.

    1993-01-01

    Many main belt asteroids cluster around certain values of semi-major axis (a), inclination (i), and eccentricity (e). Hirayama was the first to notice these concentrations which he interpreted as evidence of disruptions of larger parent bodies. He called these clusters 'asteroid families'. The term 'families' is increasingly reserved for genetic associations to distinguish them from clusters of unknown or purely dynamical origin (e.g. the Phocaea cluster). Members of a genetic asteroid family represent fragments derived from various depths within the original parent planetesimal. Thus, family members offer the potential for direct examination of the interiors of parent bodies which have undergone metamorphism and differentiation similar to that occurring in the inaccessible interiors of terrestrial planets. The differentiation similar to that occurring in the inaccessible interiors of terrestrial planets. The condition that genetic family members represent the fragments of a parent object provides a critical test of whether an association (cluster in proper element space) is a genetic family. Compositions (types and relative abundances of materials) of family members must permit the reconstruction of a compositionally plausible parent body. The compositions of proposed family members can be utilized to test the genetic reality of the family and to determine the type and degree of internal differentiation within the parent planetesimal. The interpretation of the S-class mineralogy provides a preliminary evaluation of family memberships. Detailed mineralogical and petrological analysis was done based on the reflectance spectra of 39 S-type asteroids. The result is a division of the S-asteroid class into seven subtypes based on compositional differences. These subtypes, designated S(I) to S(VII), correspond to surface silicate assemblages ranging from monomineralic olivine (dunites) through olivine-pyroxene mixtures to pure pyroxene or pyroxene-feldspar mixtures (basalts). The most general conclusion is that the S-asteroids cannot be treated as a single group of objects without greatly oversimplifying their properties. Each S-subtype needs to be treated as an independent group with a distinct evolutionary history.

  4. On the design and analysis of clinical trials with correlated outcomes

    PubMed Central

    Follmann, Dean; Proschan, Michael

    2014-01-01

    SUMMARY The convention in clinical trials is to regard outcomes as independently distributed, but in some situations they may be correlated. For example, in infectious diseases, correlation may be induced if participants have contact with a common infectious source, or share hygienic tips that prevent infection. This paper discusses the design and analysis of randomized clinical trials that allow arbitrary correlation among all randomized volunteers. This perspective generalizes the traditional perspective of strata, where patients are exchangeable within strata, and independent across strata. For theoretical work, we focus on the test of no treatment effect μ1 − μ0 = 0 when the n dimensional vector of outcomes follows a Gaussian distribution with known n × n covariance matrix Σ, where the half randomized to treatment (placebo) have mean response μ1 (μ0). We show how the new test corresponds to familiar tests in simple situations for independent, exchangeable, paired, and clustered data. We also discuss the design of trials where Σ is known before or during randomization of patients and evaluate randomization schemes based on such knowledge. We provide two complex examples to illustrate the method, one for a study of 23 family clusters with cardiomyopathy, the other where the malaria attack rates vary within households and clusters of households in a Malian village. PMID:25111420

  5. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology

    PubMed Central

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease. PMID:27977767

  6. Identification of Alfalfa Leaf Diseases Using Image Recognition Technology.

    PubMed

    Qin, Feng; Liu, Dongxia; Sun, Bingda; Ruan, Liu; Ma, Zhanhong; Wang, Haiguang

    2016-01-01

    Common leaf spot (caused by Pseudopeziza medicaginis), rust (caused by Uromyces striatus), Leptosphaerulina leaf spot (caused by Leptosphaerulina briosiana) and Cercospora leaf spot (caused by Cercospora medicaginis) are the four common types of alfalfa leaf diseases. Timely and accurate diagnoses of these diseases are critical for disease management, alfalfa quality control and the healthy development of the alfalfa industry. In this study, the identification and diagnosis of the four types of alfalfa leaf diseases were investigated using pattern recognition algorithms based on image-processing technology. A sub-image with one or multiple typical lesions was obtained by artificial cutting from each acquired digital disease image. Then the sub-images were segmented using twelve lesion segmentation methods integrated with clustering algorithms (including K_means clustering, fuzzy C-means clustering and K_median clustering) and supervised classification algorithms (including logistic regression analysis, Naive Bayes algorithm, classification and regression tree, and linear discriminant analysis). After a comprehensive comparison, the segmentation method integrating the K_median clustering algorithm and linear discriminant analysis was chosen to obtain lesion images. After the lesion segmentation using this method, a total of 129 texture, color and shape features were extracted from the lesion images. Based on the features selected using three methods (ReliefF, 1R and correlation-based feature selection), disease recognition models were built using three supervised learning methods, including the random forest, support vector machine (SVM) and K-nearest neighbor methods. A comparison of the recognition results of the models was conducted. The results showed that when the ReliefF method was used for feature selection, the SVM model built with the most important 45 features (selected from a total of 129 features) was the optimal model. For this SVM model, the recognition accuracies of the training set and the testing set were 97.64% and 94.74%, respectively. Semi-supervised models for disease recognition were built based on the 45 effective features that were used for building the optimal SVM model. For the optimal semi-supervised models built with three ratios of labeled to unlabeled samples in the training set, the recognition accuracies of the training set and the testing set were both approximately 80%. The results indicated that image recognition of the four alfalfa leaf diseases can be implemented with high accuracy. This study provides a feasible solution for lesion image segmentation and image recognition of alfalfa leaf disease.

  7. Mod-2 wind turbine project assessment and cluster test plans

    NASA Technical Reports Server (NTRS)

    Gordon, L. H.

    1982-01-01

    An assessment of the Mod-2 Wind Turbine project is presented based on initial goals and present results. Specifically, the Mod-2 background, project flow, and a chronology of events/results leading to Mod-2 acceptance is presented. After checkout/acceptance of the three operating turbines, NASA/LeRC will continue management of a two year test program performed at the DOE Goodnoe Hills test site. This test program is expected to yield data necessary for the continued development and optimization of wind energy systems. These test activities, the implementation of, and the results to date are also presented.

  8. Brownian model of transcriptome evolution and phylogenetic network visualization between tissues.

    PubMed

    Gu, Xun; Ruan, Hang; Su, Zhixi; Zou, Yangyun

    2017-09-01

    While phylogenetic analysis of transcriptomes of the same tissue is usually congruent with the species tree, the controversy emerges when multiple tissues are included, that is, whether species from the same tissue are clustered together, or different tissues from the same species are clustered together. Recent studies have suggested that phylogenetic network approach may shed some lights on our understanding of multi-tissue transcriptome evolution; yet the underlying evolutionary mechanism remains unclear. In this paper we develop a Brownian-based model of transcriptome evolution under the phylogenetic network that can statistically distinguish between the patterns of species-clustering and tissue-clustering. Our model can be used as a null hypothesis (neutral transcriptome evolution) for testing any correlation in tissue evolution, can be applied to cancer transcriptome evolution to study whether two tumors of an individual appeared independently or via metastasis, and can be useful to detect convergent evolution at the transcriptional level. Copyright © 2017. Published by Elsevier Inc.

  9. Diversity in phenotypic and nutritional traits in vegetable amaranth (Amaranthus tricolor), a nutritionally underutilised crop.

    PubMed

    Shukla, Sudhir; Bhargava, Atul; Chatterjee, Avijeet; Pandey, Avinash Chandra; Mishra, Brij K

    2010-01-15

    Assessment of genetic diversity in a crop-breeding programme helps in the identification of diverse parental combinations to create segregating progenies with maximum genetic variability and facilitates introgression of desirable genes from diverse germplasm into the available genetic base. In the present study, 39 strains of vegetable amaranth (Amaranthus tricolor) were evaluated for eight morphological and seven quality traits for two test seasons to study the extent of genetic divergence among the strains. Multivariate analysis showed that the first four principal components contributed 67.55% of the variability. Cluster analysis grouped the strains into six clusters that displayed a wide range of diversity for most of the traits. Cluster analysis has proved to be an effective method in grouping strains that may facilitate effective management and utilisation in crop-breeding programmes. The diverse strains falling in different clusters were identified, which can be utilised in different hybridisation programmes to develop high-foliage-yielding varieties rich in nutritional components. Copyright (c) 2009 Society of Chemical Industry.

  10. A stereoscopic system for viewing the temporal evolution of brain activity clusters in response to linguistic stimuli

    NASA Astrophysics Data System (ADS)

    Forbes, Angus; Villegas, Javier; Almryde, Kyle R.; Plante, Elena

    2014-03-01

    In this paper, we present a novel application, 3D+Time Brain View, for the stereoscopic visualization of functional Magnetic Resonance Imaging (fMRI) data gathered from participants exposed to unfamiliar spoken languages. An analysis technique based on Independent Component Analysis (ICA) is used to identify statistically significant clusters of brain activity and their changes over time during different testing sessions. That is, our system illustrates the temporal evolution of participants' brain activity as they are introduced to a foreign language through displaying these clusters as they change over time. The raw fMRI data is presented as a stereoscopic pair in an immersive environment utilizing passive stereo rendering. The clusters are presented using a ray casting technique for volume rendering. Our system incorporates the temporal information and the results of the ICA into the stereoscopic 3D rendering, making it easier for domain experts to explore and analyze the data.

  11. An incremental DPMM-based method for trajectory clustering, modeling, and retrieval.

    PubMed

    Hu, Weiming; Li, Xi; Tian, Guodong; Maybank, Stephen; Zhang, Zhongfei

    2013-05-01

    Trajectory analysis is the basis for many applications, such as indexing of motion events in videos, activity recognition, and surveillance. In this paper, the Dirichlet process mixture model (DPMM) is applied to trajectory clustering, modeling, and retrieval. We propose an incremental version of a DPMM-based clustering algorithm and apply it to cluster trajectories. An appropriate number of trajectory clusters is determined automatically. When trajectories belonging to new clusters arrive, the new clusters can be identified online and added to the model without any retraining using the previous data. A time-sensitive Dirichlet process mixture model (tDPMM) is applied to each trajectory cluster for learning the trajectory pattern which represents the time-series characteristics of the trajectories in the cluster. Then, a parameterized index is constructed for each cluster. A novel likelihood estimation algorithm for the tDPMM is proposed, and a trajectory-based video retrieval model is developed. The tDPMM-based probabilistic matching method and the DPMM-based model growing method are combined to make the retrieval model scalable and adaptable. Experimental comparisons with state-of-the-art algorithms demonstrate the effectiveness of our algorithm.

  12. Pilot study of a brief intervention based on the theory of planned behaviour and self-identity to increase chlamydia testing among young people living in deprived areas.

    PubMed

    Booth, Amy R; Norman, Paul; Goyder, Elizabeth; Harris, Peter R; Campbell, Michael J

    2014-09-01

    This study sought to estimate the effects of a novel intervention, compared with usual chlamydia testing promotion, on chlamydia test uptake and intentions among young people living in deprived areas. The intervention was based on the theory of planned behaviour, augmented with self-identity, and targeted the significant predictors of chlamydia testing intentions identified in the previous research. Cluster randomization was used to allocate college tutor groups (intervention n = 10; control n = 11) to the intervention or control group. The sample comprised 253 participants (intervention n = 145, control n = 108). The primary outcome was test offer uptake at the end of the session. Other outcomes measured at immediate follow-up were intention, attitude, subjective norm, perceived behavioural control, and self-identity. Generalized estimating equations, controlling for cluster effects and sexual activity, found a small but non-significant effect of condition on test offer uptake, OR = 1.65 (95% CI 0.70, 3.88) p = .25, with 57.5% of intervention participants accepting the offer of a test compared with 40.2% of control participants. Using the same analysis procedure, small-to-medium intervention effects were found on other outcome variables, including a significant effect on attitudes towards chlamydia testing, OR = 1.37 (95% CI 1.00, 1.87), p = .05. The results provide encouraging initial evidence that this theory-based intervention, targeting the key determinants of chlamydia testing, may help to improve chlamydia testing uptake in a high-risk group. They support the conduct of a larger trial to evaluate the effectiveness of the intervention. What is already known on this subject? Young people living in areas of increased socio-economic deprivation have been identified as a high-risk group for chlamydia. Previous research within an extended model of the theory of planned behaviour (TPB) found that attitude, subjective norm, perceived behavioural control, and self-identity all significantly predicted chlamydia testing intentions in this high-risk group. What does this study add? Development and testing of a novel, TPB-based intervention targeting predictors of chlamydia testing intentions. The intervention led to significantly more positive attitudes towards chlamydia testing. Preliminary indication that a TPB-based intervention may help to improve chlamydia testing in a high-risk group. © 2013 The British Psychological Society.

  13. Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in Amyotrophic Lateral Sclerosis.

    PubMed

    Carreiro, André V; Amaral, Pedro M T; Pinto, Susana; Tomás, Pedro; de Carvalho, Mamede; Madeira, Sara C

    2015-12-01

    Amyotrophic Lateral Sclerosis (ALS) is a devastating disease and the most common neurodegenerative disorder of young adults. ALS patients present a rapidly progressive motor weakness. This usually leads to death in a few years by respiratory failure. The correct prediction of respiratory insufficiency is thus key for patient management. In this context, we propose an innovative approach for prognostic prediction based on patient snapshots and time windows. We first cluster temporally-related tests to obtain snapshots of the patient's condition at a given time (patient snapshots). Then we use the snapshots to predict the probability of an ALS patient to require assisted ventilation after k days from the time of clinical evaluation (time window). This probability is based on the patient's current condition, evaluated using clinical features, including functional impairment assessments and a complete set of respiratory tests. The prognostic models include three temporal windows allowing to perform short, medium and long term prognosis regarding progression to assisted ventilation. Experimental results show an area under the receiver operating characteristics curve (AUC) in the test set of approximately 79% for time windows of 90, 180 and 365 days. Creating patient snapshots using hierarchical clustering with constraints outperforms the state of the art, and the proposed prognostic model becomes the first non population-based approach for prognostic prediction in ALS. The results are promising and should enhance the current clinical practice, largely supported by non-standardized tests and clinicians' experience. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Network-based spatial clustering technique for exploring features in regional industry

    NASA Astrophysics Data System (ADS)

    Chou, Tien-Yin; Huang, Pi-Hui; Yang, Lung-Shih; Lin, Wen-Tzu

    2008-10-01

    In the past researches, industrial cluster mainly focused on single or particular industry and less on spatial industrial structure and mutual relations. Industrial cluster could generate three kinds of spillover effects, including knowledge, labor market pooling, and input sharing. In addition, industrial cluster indeed benefits industry development. To fully control the status and characteristics of district industrial cluster can facilitate to improve the competitive ascendancy of district industry. The related researches on industrial spatial cluster were of great significance for setting up industrial policies and promoting district economic development. In this study, an improved model, GeoSOM, that combines DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and SOM (Self-Organizing Map) was developed for analyzing industrial cluster. Different from former distance-based algorithm for industrial cluster, the proposed GeoSOM model can calculate spatial characteristics between firms based on DBSCAN algorithm and evaluate the similarity between firms based on SOM clustering analysis. The demonstrative data sets, the manufacturers around Taichung County in Taiwan, were analyzed for verifying the practicability of the proposed model. The analyzed results indicate that GeoSOM is suitable for evaluating spatial industrial cluster.

  15. Role of donor hemodynamic trajectory in determining graft survival in liver transplantation from donation after circulatory death donors.

    PubMed

    Firl, Daniel J; Hashimoto, Koji; O'Rourke, Colin; Diago-Uso, Teresa; Fujiki, Masato; Aucejo, Federico N; Quintini, Cristiano; Kelly, Dympna M; Miller, Charles M; Fung, John J; Eghtesad, Bijan

    2016-11-01

    Donation after circulatory death (DCD) donors show heterogeneous hemodynamic trajectories following withdrawal of life support. Impact of hemodynamics in DCD liver transplant is unclear, and objective measures of graft viability would ease transplant surgeon decision making and inform safe expansion of the donor organ pool. This retrospective study tested whether hemodynamic trajectories were associated with transplant outcomes in DCD liver transplantation (n = 87). Using longitudinal clustering statistical techniques, we phenotyped DCD donors based on hemodynamic trajectory for both mean arterial pressure (MAP) and peripheral oxygen saturation (SpO 2 ) following withdrawal of life support. Donors were categorized into 3 clusters: those who gradually decline after withdrawal of life support (cluster 1), those who maintain stable hemodynamics followed by rapid decline (cluster 2), and those who decline rapidly (cluster 3). Clustering outputs were used to compare characteristics and transplant outcomes. Cox proportional hazards modeling revealed hepatocellular carcinoma (hazard ratio [HR] = 2.53; P = 0.047), cold ischemia time (HR = 1.50 per hour; P = 0.027), and MAP cluster 1 were associated with increased risk of graft loss (HR = 3.13; P = 0.021), but not SpO 2 cluster (P = 0.172) or donor warm ischemia time (DWIT; P = 0.154). Despite longer DWIT, MAP and SpO 2 clusters 2 showed similar graft survival to MAP and SpO 2 clusters 3, respectively. In conclusion, despite heterogeneity in hemodynamic trajectories, DCD donors can be categorized into 3 clinically meaningful subgroups that help predict graft prognosis. Further studies should confirm the utility of liver grafts from cluster 2. Liver Transplantation 22 1469-1481 2016 AASLD. © 2016 by the American Association for the Study of Liver Diseases.

  16. XMM-Newton X-ray and HST weak gravitational lensing study of the extremely X-ray luminous galaxy cluster Cl J120958.9+495352 (z = 0.902)

    NASA Astrophysics Data System (ADS)

    Thölken, Sophia; Schrabback, Tim; Reiprich, Thomas H.; Lovisari, Lorenzo; Allen, Steven W.; Hoekstra, Henk; Applegate, Douglas; Buddendiek, Axel; Hicks, Amalia

    2018-03-01

    Context. Observations of relaxed, massive, and distant clusters can provide important tests of standard cosmological models, for example by using the gas mass fraction. To perform this test, the dynamical state of the cluster and its gas properties have to be investigated. X-ray analyses provide one of the best opportunities to access this information and to determine important properties such as temperature profiles, gas mass, and the total X-ray hydrostatic mass. For the last of these, weak gravitational lensing analyses are complementary independent probes that are essential in order to test whether X-ray masses could be biased. Aims: We study the very luminous, high redshift (z = 0.902) galaxy cluster Cl J120958.9+495352 using XMM-Newton data. We measure global cluster properties and study the temperature profile and the cooling time to investigate the dynamical status with respect to the presence of a cool core. We use Hubble Space Telescope (HST) weak lensing data to estimate its total mass and determine the gas mass fraction. Methods: We perform a spectral analysis using an XMM-Newton observation of 15 ks cleaned exposure time. As the treatment of the background is crucial, we use two different approaches to account for the background emission to verify our results. We account for point spread function effects and deproject our results to estimate the gas mass fraction of the cluster. We measure weak lensing galaxy shapes from mosaic HST imaging and select background galaxies photometrically in combination with imaging data from the William Herschel Telescope. Results: The X-ray luminosity of Cl J120958.9+495352 in the 0.1-2.4 keV band estimated from our XMM-Newton data is LX = (13.4+1.2-1.0) × 1044 erg/s and thus it is one of the most X-ray luminous clusters known at similarly high redshift. We find clear indications for the presence of a cool core from the temperature profile and the central cooling time, which is very rare at such high redshifts. Based on the weak lensing analysis, we estimate a cluster mass of M500/1014 M⊙ = 4.4+2.2-2.0 (stat.) + 0.6 (sys.) and a gas mass fraction of fgas,2500 = 0.11-0.03+0.06 in good agreement with previous findings for high redshift and local clusters.

  17. Efficient Agent-Based Cluster Ensembles

    NASA Technical Reports Server (NTRS)

    Agogino, Adrian; Tumer, Kagan

    2006-01-01

    Numerous domains ranging from distributed data acquisition to knowledge reuse need to solve the cluster ensemble problem of combining multiple clusterings into a single unified clustering. Unfortunately current non-agent-based cluster combining methods do not work in a distributed environment, are not robust to corrupted clusterings and require centralized access to all original clusterings. Overcoming these issues will allow cluster ensembles to be used in fundamentally distributed and failure-prone domains such as data acquisition from satellite constellations, in addition to domains demanding confidentiality such as combining clusterings of user profiles. This paper proposes an efficient, distributed, agent-based clustering ensemble method that addresses these issues. In this approach each agent is assigned a small subset of the data and votes on which final cluster its data points should belong to. The final clustering is then evaluated by a global utility, computed in a distributed way. This clustering is also evaluated using an agent-specific utility that is shown to be easier for the agents to maximize. Results show that agents using the agent-specific utility can achieve better performance than traditional non-agent based methods and are effective even when up to 50% of the agents fail.

  18. The application of k-Nearest Neighbour in the identification of high potential archers based on relative psychological coping skills variables

    NASA Astrophysics Data System (ADS)

    Taha, Zahari; Muazu Musa, Rabiu; Majeed, Anwar P. P. Abdul; Razali Abdullah, Mohamad; Muaz Alim, Muhammad; Nasir, Ahmad Fakhri Ab

    2018-04-01

    The present study aims at classifying and predicting high and low potential archers from a collection of psychological coping skills variables trained on different k-Nearest Neighbour (k-NN) kernels. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed k-NN models, i.e. fine, medium, coarse, cosine, cubic and weighted kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the cosine k-NN model exhibited good accuracy and precision throughout the exercise with an accuracy of 94% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the rest of the models. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.

  19. Fatigue and Impact Strength of Diffusion Bonded Titanium Alloy Joints

    DTIC Science & Technology

    1989-02-01

    likely to be due to the void level being such that the chance of a pore cluster being present at or near the test piece surface was less probable...in sub-surface crack initiation and reduced fatigue strength; it was concluded that small single voids were insignificant but clusters of voids...strength is reduced when clusters of pores are present, and is, in turn, a much more sensitive test than the tensile test. In the current work the

  20. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    PubMed

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

Top